Creative Genius Creative Genius
Lesson 2 of 3 · 16 min read

Context Windows: The Real Limits

A model's advertised context window is the ceiling, not the floor. Real-world performance degrades well before you hit it.

OpenAI markets GPT-4o as having a 128K context window. Anthropic markets Claude 3.5 Sonnet as 200K. Gemini 1.5 Pro reaches 1M+. Sounds infinite. It isn't.

The "lost in the middle" problem

Models attend more carefully to the beginning and end of their context. Information stuffed in the middle of a 100K-token prompt is statistically less likely to be used. Stanford and Anthropic both published landmark papers on this — and the effect persists across all major frontier models.

Practical rules

  • Put the most important information at the start or end of the prompt.
  • For RAG systems, retrieve fewer, more relevant chunks. 5 great chunks beat 50 mediocre ones.
  • For long documents, ask the model to summarize section-by-section before answering.
← What is a Token? Temperature, Top-p, and Why Output Varies →