The minimum bar for production AI:
- Log every prompt, response, and tool call to a structured store (Postgres, Datadog, Langfuse).
- Track tokens in/out and cost per request. Aggregate by user, endpoint, and feature.
- Sample 1–10% of production traffic into an eval queue for human review.
- Set up alerts on cost-per-user spikes — they catch infinite loops fast.
Tools to know: Langfuse, LangSmith, Helicone, Phoenix (Arize). All have generous free tiers.