What to measure
| Metric | Why it matters |
|---|---|
| Retrieval vs generation cost split | Identify whether vector/search or model calls drive spend. |
| End-to-end latency | Track user-visible performance, not isolated model latency only. |
| Retry count per request | Expose hidden spend multipliers from failure handling. |
| Context token size | Spot prompt/context drift before bills spike. |
