Benchmarking Claude's Five Effort Levels Reveals Cost Surprises in Agentic Tasks
A developer tested Anthropic's Claude models across five effort settings — low, medium, high, xhigh, and max — measuring token usage, latency, and output quality on three real-world tasks. For simple classification tasks, quality remained identical at all effort levels while token consumption rose up to eightfold at max, making higher settings wasteful. Code generation quality improved up to the 'high' setting and then plateaued, with xhigh and max adding cost without meaningful gains. Most surprisingly, the multi-step audit task consumed fewer total tokens at xhigh than at medium, because better upfront planning reduced wasted turns and dead ends. The findings suggest effort should be tuned per task type, with agentic workflows often being both cheaper and more accurate at xhigh than at lower settings.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.
Discussion (0)
Log in to join the discussion and vote.
Log in