For tasks with a single right answer, code-checks are king:
- Exact match — for classifications, slot extraction
- JSON schema validation — for structured outputs
- Regex / contains — for known phrases or formats
- Numeric comparison with tolerance — for calculations