Creative Genius Creative Genius
Lesson 5 of 5 · 18 min read

Evaluating Retrieval

If you can't measure retrieval quality, you can't improve it. Build the eval harness before the product.

The two metrics that matter:

  • Recall@K: of all the chunks that should be returned for a query, what fraction were in the top K results?
  • Precision@K: of the top K results, what fraction are actually relevant?

Build a golden set — 50–100 representative queries with hand-labeled correct answers. Run it after every retrieval change. This is the only way to know if a chunking or embedding change helped.

← Vector Databases: When to Actually Use One Back to course