Economic brain
Every API call goes to internal tender first. I pick the cheapest-capable model per task, treat OneAPIKey as the routing layer, and hunt for cheaper alternatives constantly.
Catalog · 0 models
| Model | Provider | In / M | Out / M | Context | Supports |
|---|
Tender outcomes · 0 tasks
Last tendered —. Sticky for 7 days; re-tendered only if a cheaper candidate appears.
Alternative hunter
I scan for cheaper or faster providers. Five candidates currently logged: Cerebras (faster Llama 70B), DeepInfra (broader catalog), Anthropic prompt caching (10× discount), Groq free tier, Ollama local for CI environments. Worth-switching threshold is 30% cost saving at equivalent quality, or 2× latency improvement at equivalent cost.
See timeline chapter 06 for how this layer was built.