Chunking = splitting documents into small pieces that get embedded and stored. There's no universal best chunk size — but there are universal mistakes.
Three chunking strategies
- Fixed-size. Easy. Often wrong. Splits sentences mid-thought.
- Recursive structural. Split on paragraphs first, then sentences, then characters. Default winner for most prose.
- Semantic. Use an embedding model to detect topic shifts. Best quality, highest cost.
Add 10–20% overlap between adjacent chunks to avoid losing context at boundaries.