Dual cheap-model agreement cuts AI frontier costs to near zero on unverifiable tasks
An AI gateway team found their system was escalating 100% of no-test prompts to expensive frontier models because it lacked a way to verify cheap-model answers without a unit test. To address this, they introduced a method where two independent low-cost models answer the same query, and if both agree, the response is served without escalating. Testing across 160 queries in four task categories — including custom adversarial traps — showed zero cases where both cheap models agreed on a wrong answer, with agreement occurring about 76% of the time. After deploying the gate in production, frontier escalation on no-test prompts dropped dramatically, with roughly 91% of requests now served by the cheap tier. The approach brought blended costs down to approximately $0.002 per request, extending the economics of verifiable tasks to the much larger class of open-ended queries.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.
Discussion (0)
Log in to join the discussion and vote.
Log in