Context windows are now absurdly long, but retrieval still matters.

Long contexts are impressive, but retrieval isn't obsolete yet.

What changed

• Context windows expanded to 200k-2M tokens across major providers

• Performance degrades in middle regions of very long contexts

• Cost and latency scale roughly linearly with context size

Who it affects

• Developers building RAG systems

• Teams analyzing large documents

• Anyone considering ditching retrieval for huge contexts

What to do now

• Test needle-in-haystack performance with your actual content

• Calculate cost at scale before committing to large contexts

• Keep using retrieval for knowledge bases unless testing proves otherwise

• Organize long contexts with clear structure and navigation aids

Explore more