Context windows are now absurdly long, but retrieval still matters.
Long contexts are impressive, but retrieval isn't obsolete yet.
What changed
• Context windows expanded to 200k-2M tokens across major providers
• Performance degrades in middle regions of very long contexts
• Cost and latency scale roughly linearly with context size
Who it affects
• Developers building RAG systems
• Teams analyzing large documents
• Anyone considering ditching retrieval for huge contexts
What to do now
• Test needle-in-haystack performance with your actual content
• Calculate cost at scale before committing to large contexts
• Keep using retrieval for knowledge bases unless testing proves otherwise
• Organize long contexts with clear structure and navigation aids
Related updates
GPT-5.2 pushes harder on real work: code, tools, long context.
More useful for shipping work, especially with structure.
Claude Opus 4.5 leans into coding + agents, with stronger robustness.
Better for serious coding + agent work, especially when you wire it properly.
Gemini 3 expands reasoning + multimodal capability across Google products.
Gemini's getting more capable, especially inside Google's own stack.
Open source models are getting scary good at specialized tasks.
Open models + fine-tuning can beat general models on your specific problem.
Embedding models made a quiet quality leap (RAG got better).
RAG quality improved quietly—test new embeddings on your content.