OpenAI markets GPT-4o as having a 128K context window. Anthropic markets Claude 3.5 Sonnet as 200K. Gemini 1.5 Pro reaches 1M+. Sounds infinite. It isn't.
The "lost in the middle" problem
Models attend more carefully to the beginning and end of their context. Information stuffed in the middle of a 100K-token prompt is statistically less likely to be used. Stanford and Anthropic both published landmark papers on this — and the effect persists across all major frontier models.
Practical rules
- Put the most important information at the start or end of the prompt.
- For RAG systems, retrieve fewer, more relevant chunks. 5 great chunks beat 50 mediocre ones.
- For long documents, ask the model to summarize section-by-section before answering.