Multi-agent AI fleets use 15x more tokens — here is how to govern costs properly
Multi-agent AI systems consume roughly 15 times the tokens of a single chat session, according to Anthropic's own analysis, making cost governance a critical engineering concern. Prompt-level instructions asking agents to 'be mindful of budget' are ineffective because they rely on model judgment rather than deterministic enforcement. Effective cost control requires two distinct layers: a hard counter in the system harness that mechanically enforces spending limits, and model-level judgment that decides whether a task warrants any spending at all. A 'novelty gate' approach ensures that routine tasks such as simple edits or known-fact lookups never reach paid APIs, eliminating the majority of unnecessary spend before it occurs. The recommended architecture assigns tiered spending policies per qualifying call, enforced by the harness, while the agent retains responsibility for classifying task complexity and flagging any reliability downgrades when fallbacks are used.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.
Discussion (0)
Log in to join the discussion and vote.
Log in