The five-layer model:
- Input filtering. Strip obvious injection patterns. Use tools like LlamaGuard 3 or Azure AI Content Safety.
- Privilege separation. Tools that touch sensitive data require human-in-the-loop confirmation, not just an LLM decision.
- Output filtering. Scan responses for sensitive patterns (CC numbers, SSNs, internal hostnames) before sending.
- Sandboxed tools. Code execution in Firecracker / e2b / Modal sandboxes — never on your prod box.
- Audit logging. Every tool call gets logged, alarmed, and reviewed.