Orchestrator Choice, Not Model Size, Drives Local LLM Agent Performance on RTX 3090

A developer benchmarked five open-weight language models across 17 coding and general-agent tasks on a single RTX 3090 GPU, comparing two orchestration frameworks: opencode and a custom LangGraph ReAct agent. The results showed that GLM-4.5-Air (106B parameters) scored 0% task adherence under opencode but jumped to 93% when driven by the LangGraph agent using native tool-calling, highlighting the orchestrator as the critical variable. Qwen3-Coder 30B-A3B was the top overall performer, achieving 100% tool adherence under both frameworks due to its agentic fine-tuning, while also being the most energy-efficient at roughly 0.0005 BGN per correctly solved task. Models that failed every task still consumed 10 to 30 times more electricity than the top performer, underscoring that energy cost per correct output is a meaningful metric for home lab setups. The benchmark, including methodology and per-watt cost tracking via an open-source tool, has been published with reproducible code.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.
Discussion (0)
Log in to join the discussion and vote.
Log in