Agent failures are mostly tool wiring, not model 'intelligence'.
Your agent isn't dumb. Your glue code is.
What changed
• Most agent bugs come from tool permissions, parsing, retries, and state handling
• Observability beats 'add one more prompt' in real systems
• Edge cases multiply when tools return partial or inconsistent data
Who it affects
• Builders
• Automation enthusiasts
• Anyone shipping agent products
What to do now
• Add tracing (inputs, tool calls, outputs, errors)
• Validate tool outputs with schemas
• Implement retries + backoff + timeouts
• Write evals for edge cases (empty results, malformed JSON, rate limits)
Appears in collections
Related updates
Claude Opus 4.5 leans into coding + agents, with stronger robustness.
Better for serious coding + agent work, especially when you wire it properly.
GPT-5.2 pushes harder on real work: code, tools, long context.
More useful for shipping work, especially with structure.
Claude API: structured outputs support expands, and older models keep getting retired.
Production users: pin versions and stop ignoring deprecation notes.
Prompt injection defenses are maturing (but still not bulletproof).
Defenses improved but aren't magic—implement defense-in-depth.