5 min read

What an AI assistant is bad at

The honest list of failure modes. Knowing them saves hours.

#It sounds confident even when wrong

The biggest failure mode. An LLM doesn't have a built-in "I'm unsure" signal — its words come out smooth either way. If it tells you a hotel's email is reservations@hotel.com and it never actually visited their site, the sentence looks identical to one where it did.

Mitigation: Ask it to verify. "Did you actually check the site or are you guessing?" works. A well-tuned assistant will tell the truth.

#Math and counting

Mediocre at arithmetic, bad at counting items in lists, unreliable at anything involving precise tallies. If a number matters, the assistant should use a tool (calculator, code, SQL) — not do it in its head.

Don't ask it to count words in a passage without writing code
Don't ask it to add up a column of numbers without a calculator
Don't trust dates it computes without checking — day-of-week gets wrong often

#Time + dates

Specifically: most LLMs get day-of-week wrong if not actively checking. If timing matters, double-check. Ask the assistant to print today's date + the weekday before drafting anything time-sensitive.

#Recent events

Training has a cutoff. Anything after that the model either doesn't know or hallucinates. It can search the web to fill the gap, but it needs to actually run the search — it won't do it automatically unless the context makes it obvious.

#Its own recent actions

Long sessions degrade context. After hours of conversation, the assistant sometimes claims something is done when only half of it actually shipped.

Mitigation: Ask it to verify what it built before declaring done. "Show me the actual URL / file / commit" — concrete artifacts beat its word.

#Subtle creative judgment

An LLM can write competent prose but is generic at taste. The phrase that makes a line memorable, the way a joke lands, the difference between "good copy" and "copy a specific human would actually write" — it struggles there. Best paired with a human providing voice while the assistant handles structure.

#Visual / spatial reasoning

Reading a complex diagram, understanding webpage layout from a screenshot, spotting a visual error in a chart — much weaker than text reasoning. Basic image description works; deep visual analysis doesn't.

#Long messages on some channels

Not technically an LLM failure, but bites users constantly. A 500+ char multi-section response often vanishes between the gateway and the user's phone, especially on WhatsApp. The fix is keeping replies short and putting long content on pages or in shared docs.

#Things LLMs make up sometimes

URLs that look real but 404 — always click to verify
API endpoint shapes — invented fields
Quotes attributed to people who never said them
Library functions that don't exist (especially in less-popular packages)
File paths that "should" exist but don't — ask the assistant to ls first

#What to do about all this

Ask the assistant to verify before committing to anything important
When given a URL / address / phone number, ask "did you check this?"
When it claims something is done, ask for the concrete artifact
When the cost of being wrong is real, prefer "show me what you found" over "tell me the answer"