Caching: Free Money on the Table — Scaling & Cost Control

Two cache layers everyone should have:

Exact-match cache. Same prompt → same response, served from Redis. Saves 100% of cost.
Semantic cache. Embed the prompt, find a near-duplicate answer that's already cached, serve it. Saves 70–95% of cost when the user base asks similar questions (which they almost always do).

OpenAI's prompt caching (automatic since late 2024) gives 50% discount on cached input tokens. Restructure long prompts so the static parts come first — that's what gets cached.