LLM API Costs Quietly Shifted in 2026 as OpenAI, Anthropic, and Google Made 14 Pricing Changes
Between January and June 2026, OpenAI, Anthropic, and Google collectively made 14 pricing changes across their model lineups, often without directly notifying developers. OpenAI's retirement of GPT-4 Turbo silently rerouted API calls to GPT-4o, which generates 30–40% more output tokens per prompt, raising per-call costs even though the per-token rate dropped. Anthropic's Claude Sonnet 4 carried the same headline input price as its predecessor but introduced default extended thinking, causing some prompts to cost up to three times more due to additional thinking token charges. Google kept Gemini 2.5 Flash's base input price unchanged but added a context surcharge that doubles the rate for prompts exceeding 128K tokens, catching teams doing long-document RAG off guard. A 2026 a16z survey found that 71% of companies using LLM APIs do not track spending at the individual call level, meaning cumulative cost drift often goes unnoticed until a monthly bill arrives.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.
Discussion (0)
Log in to join the discussion and vote.
Log in