SShortSingh.
Back to feed

Benchmarking Claude's Five Effort Levels Reveals Cost Surprises in Agentic Tasks

0
·1 views

A developer tested Anthropic's Claude models across five effort settings — low, medium, high, xhigh, and max — measuring token usage, latency, and output quality on three real-world tasks. For simple classification tasks, quality remained identical at all effort levels while token consumption rose up to eightfold at max, making higher settings wasteful. Code generation quality improved up to the 'high' setting and then plateaued, with xhigh and max adding cost without meaningful gains. Most surprisingly, the multi-step audit task consumed fewer total tokens at xhigh than at medium, because better upfront planning reduced wasted turns and dead ends. The findings suggest effort should be tuned per task type, with agentic workflows often being both cheaper and more accurate at xhigh than at lower settings.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

Log in to join the discussion and vote.

Log in

Related stories

0
ProgrammingDEV Community ·

LOOM language enforces honest effect declarations from source code to WebAssembly

A developer has built LOOM, a programming language designed to enforce strict honesty about what code actually does at every level, down to WebAssembly output. LOOM's effect-checking system categorises function behaviour into classes such as Pure, IO, Network, and Allocation, then verifies that each function's declared effects match what it truly performs. The checker traces effects transitively through calls, closures, recursion, and higher-order functions, preventing any hidden side effects from being smuggled through helper code. Capability seams allow developers to explicitly control what foreign or AI-generated code can access, blocking unauthorised network or output operations at runtime. The language also supports affine and linear resource tracking, ensuring critical resources like sockets or one-time keys are used correctly and never duplicated or lost.

0
ProgrammingDEV Community ·

Software Developer vs Software Engineer: What the Titles Actually Mean

The terms Software Developer and Software Engineer are often used interchangeably, but they can reflect different professional focuses. Developers primarily write and maintain code, build features, and fix bugs with a focus on delivering working applications. Engineers tend to work at a broader level, designing system architecture and considering scalability, security, and long-term reliability. However, industry observers note that job titles matter far less than the actual skills and value a professional brings to their work. The debate continues in tech communities, with many practitioners finding that their daily responsibilities often blur the line between the two roles.

0
ProgrammingDEV Community ·

How to Build a Two-Way Logistics Email Agent for Shipment Updates

Most logistics teams automate outbound shipment notifications but fail to handle customer replies, leaving inbound queries unanswered in no-reply mailboxes. A developer at Nylas outlines how to build a two-way email agent using a Nylas Agent Account, which functions as a real, code-owned mailbox. The setup allows a transport management system to trigger outbound milestone emails while also receiving and processing inbound customer queries like shipment status requests. Unlike traditional email service providers that only push messages, this approach enables the same service to read incoming replies via webhook and respond with live shipment data. The result is a single, threaded email channel handling both proactive updates and reactive customer support without separate ticketing systems.

0
ProgrammingDEV Community ·

Misplaced .cursor folder caused editor to consume over 50GB of RAM

A developer discovered that the Cursor code editor was consuming over 50GB of memory on their Mac while working on a Next.js project. Initial debugging pointed toward the app itself, with checks for React render loops, memory leaks, and Next.js cache issues all proving fruitless. The root cause turned out to be a .cursor configuration folder nested inside a subdirectory instead of sitting at the project root. Moving the folder back to the correct location and fully restarting Cursor immediately resolved the excessive memory usage. The incident highlights the importance of checking hidden, tool-specific directories in a project when diagnosing unexpected editor or system performance issues.

Benchmarking Claude's Five Effort Levels Reveals Cost Surprises in Agentic Tasks · ShortSingh