Use smaller models for draft generation, shorten prompts, trim long conversation history and cache reusable answers when possible.
Validate prompts on cheaper models or smaller settings first, then run the final high-quality generation once the workflow is stable.
Split keys by project, review key-level usage regularly and compare model usage patterns before changing routing or defaults.
max_tokens conservativelyThe easiest win is usually model routing: keep a small model for draft work and call stronger models only on the final step.