Coinbase Halves AI Costs Using Smarter Routing, Not Developer Restrictions
Coinbase CEO Brian Armstrong revealed this week that the company cut its AI spending by half despite exponential growth in token usage, without imposing access limits on engineers. The company achieved this through five tactics, including defaulting to cheaper open-weight models like GLM 5.2 and Kimi 2.7, and routing prompts to models based on task complexity. A key driver was improving prompt caching hit rates from 5% to 60%, which Armstrong described as the highest-leverage change. Engineers retain the freedom to override defaults and choose more capable models when needed, but spending is tracked with an expectation of proportional impact. The approach signals a broader enterprise shift toward cost-efficient open-weight models, posing potential revenue pressure on providers like Anthropic and OpenAI.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.
Discussion (0)
Log in to join the discussion and vote.
Log in