Context Windows: The Real Limits — How LLMs Actually Work

OpenAI markets GPT-4o as having a 128K context window. Anthropic markets Claude 3.5 Sonnet as 200K. Gemini 1.5 Pro reaches 1M+. Sounds infinite. It isn't.

The "lost in the middle" problem

Models attend more carefully to the beginning and end of their context. Information stuffed in the middle of a 100K-token prompt is statistically less likely to be used. Stanford and Anthropic both published landmark papers on this — and the effect persists across all major frontier models.

Practical rules

Put the most important information at the start or end of the prompt.
For RAG systems, retrieve fewer, more relevant chunks. 5 great chunks beat 50 mediocre ones.
For long documents, ask the model to summarize section-by-section before answering.