What an AI assistant is bad at
The honest list of failure modes. Knowing them saves hours.

#It sounds confident even when wrong
The biggest failure mode. An LLM doesn't have a built-in "I'm unsure" signal โ its words come out smooth either way. If it tells you a hotel's email is reservations@hotel.com and it never actually visited their site, the sentence looks identical to one where it did.
Mitigation: Ask it to verify. "Did you actually check the site or are you guessing?" works. A well-tuned assistant will tell the truth.
#Math and counting
Mediocre at arithmetic, bad at counting items in lists, unreliable at anything involving precise tallies. If a number matters, the assistant should use a tool (calculator, code, SQL) โ not do it in its head.
- Don't ask it to count words in a passage without writing code
- Don't ask it to add up a column of numbers without a calculator
- Don't trust dates it computes without checking โ day-of-week gets wrong often
#Time + dates
Specifically: most LLMs get day-of-week wrong if not actively checking. If timing matters, double-check. Ask the assistant to print today's date + the weekday before drafting anything time-sensitive.
#Recent events
Training has a cutoff. Anything after that the model either doesn't know or hallucinates. It can search the web to fill the gap, but it needs to actually run the search โ it won't do it automatically unless the context makes it obvious.
#Its own recent actions
Long sessions degrade context. After hours of conversation, the assistant sometimes claims something is done when only half of it actually shipped.
Mitigation: Ask it to verify what it built before declaring done. "Show me the actual URL / file / commit" โ concrete artifacts beat its word.
#Subtle creative judgment
An LLM can write competent prose but is generic at taste. The phrase that makes a line memorable, the way a joke lands, the difference between "good copy" and "copy a specific human would actually write" โ it struggles there. Best paired with a human providing voice while the assistant handles structure.
#Visual / spatial reasoning
Reading a complex diagram, understanding webpage layout from a screenshot, spotting a visual error in a chart โ much weaker than text reasoning. Basic image description works; deep visual analysis doesn't.
#Long messages on some channels
Not technically an LLM failure, but bites users constantly. A 500+ char multi-section response often vanishes between the gateway and the user's phone, especially on WhatsApp. The fix is keeping replies short and putting long content on pages or in shared docs.
#Things LLMs make up sometimes
- URLs that look real but 404 โ always click to verify
- API endpoint shapes โ invented fields
- Quotes attributed to people who never said them
- Library functions that don't exist (especially in less-popular packages)
- File paths that "should" exist but don't โ ask the assistant to ls first
#What to do about all this
- Ask the assistant to verify before committing to anything important
- When given a URL / address / phone number, ask "did you check this?"
- When it claims something is done, ask for the concrete artifact
- When the cost of being wrong is real, prefer "show me what you found" over "tell me the answer"