Orchestrator Choice, Not Model Size, Drives Local LLM Agent Performance on RTX 3090

·1 views

A developer benchmarked five open-weight language models across 17 coding and general-agent tasks on a single RTX 3090 GPU, comparing two orchestration frameworks: opencode and a custom LangGraph ReAct agent. The results showed that GLM-4.5-Air (106B parameters) scored 0% task adherence under opencode but jumped to 93% when driven by the LangGraph agent using native tool-calling, highlighting the orchestrator as the critical variable. Qwen3-Coder 30B-A3B was the top overall performer, achieving 100% tool adherence under both frameworks due to its agentic fine-tuning, while also being the most energy-efficient at roughly 0.0005 BGN per correctly solved task. Models that failed every task still consumed 10 to 30 times more electricity than the top performer, underscoring that energy cost per correct output is a meaningful metric for home lab setups. The benchmark, including methodology and per-watt cost tracking via an open-source tool, has been published with reproducible code.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

How Belac Media Builds Safe, Auditable Social Publishing Workflows for Clients

Australian agency Belac Media has developed a structured approach to social media automation that prioritises client safety over publishing volume. The system uses three content modes — draft, queue, and auto — to match the level of human review to the reputational risk of each post. Platform integrations are chosen deliberately: reliable APIs are used directly, schedulers handle compatible social channels, and browser automation is reserved only for platforms that block API access. Every publishing action generates a receipt logging the source, platform URL, publish state, and timestamp to prevent duplicates and maintain accountability. The core principle is that automation should eliminate repetitive admin tasks while preserving human judgement where it genuinely matters.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Seven Common Ways AI Agents Fail in Production and How to Fix Them

AI agents deployed in production environments consistently exhibit a set of recurring failure patterns that often go undetected by standard observability tools. Common issues include tool-call loops where agents repeat identical actions without making progress, silent context degradation as the model's memory window fills with stale data, and cost overruns caused by task-to-model mismatches. These failures are difficult to catch because they rarely trigger explicit errors, instead manifesting as gradual quality decline or runaway token consumption. Engineers are advised to track information gain, context pressure, and cost acceleration as proactive signals, and to implement automated interventions such as context compression, circuit-breakers, and mid-session model escalation.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Serve Speed Barely Affects Match Wins; Placement Consistency Is the Real Edge

An analysis of 487 ATP matches from 2023–2024 found that serve speed above 115 mph has virtually no correlation with match victories once consistency is accounted for. Players averaging over 123 mph on first serves won only 58.2% of matches, barely more than those averaging 106 mph at 56.1%. A server with a slower average but higher first-serve percentage outperformed harder hitters in head-to-head comparisons. The study, which examined over 22,000 service games, found that placement variance on break points predicted match winners 7.3 times more accurately than raw velocity. The findings challenge the conventional broadcast narrative that equates faster serves with dominant serving performance.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

How to Write DESIGN.md Files That AI Agents Can Actually Follow

A structured DESIGN.md file helps AI agents apply design systems correctly by explaining intent and rules rather than just listing token values. For example, instead of stating a color hex code, effective prose explains that a color is reserved for the single most important action on screen. Large language models parse markdown with high fidelity, making well-written prose an efficient channel for communicating design rationale. Key sections include an overview, color roles, typography, layout, and a Do's and Don'ts list that sets hard guardrails against common mistakes. The core principle is that tokens tell an agent what a value is, but only prose tells it how and when to use that value.

0 comments Read more at DEV Community