Developer builds auditable AI cost-modeling pipeline to find cheapest quality-adjusted LLM

·1 views

A developer behind the Hermes Agent framework built an automated pipeline to answer real cost questions faced by AI agent builders, frustrated by inaccurate online advice. The system uses research agents to pull live, cited token prices and benchmarks, then runs all calculations through an exact-rational math kernel to avoid floating-point errors or LLM-generated arithmetic mistakes. Tested across eight cost scenarios, the pipeline ranked open-weight models by blended cost divided by agentic quality score, with DeepSeek V3.2 via OpenRouter emerging as the top value at roughly $1.49 per quality unit. DeepSeek V4 Flash on Fireworks was flagged as a potentially cheaper alternative pending further quality testing. The full methodology and dataset have been published in a public repository so results can be independently reproduced.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

Why 'Order from Digi-Key' Fails at Scale: A Production Supply Chain Guide

A supply chain strategy guide for electronics producers running 500 to 50,000 units per year warns that relying on authorized distributors like Digi-Key is a prototyping approach, not a viable production strategy. The 2020–2023 global IC shortage, which pushed lead times to 52 weeks, exposed how unprepared many manufacturers were when supply chains had not been deliberately designed for resilience. The guide recommends a multi-source bill-of-materials from day one, with primary, secondary, and alternate-part sources documented for every critical component, alongside buffer stock targets. Volume pricing data illustrates that switching from Digi-Key to franchise distributors like Arrow or Avnet, or buying manufacturer-direct, can reduce unit costs by 30–60% at quantities above 1,000 units. The guide also cautions against sourcing ICs from spot markets or brokers due to well-documented counterfeit risks affecting common chips such as STM32 and ESP32.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Developer Documents Common Shielded Token Contract Errors on Midnight Blockchain

A developer who spent months building shielded liquidity DeFi protocols on the Midnight blockchain has published a detailed breakdown of critical errors encountered during the process. The post focuses on mistakes made while writing contracts in Midnight's Compact language, particularly around shielded fungible tokens used in lending protocols, liquidity pools, and decentralized exchanges. Midnight's shielded token model relies on a protocol called Zswap, which differs significantly from EVM-based environments like Solidity, causing developers to carry over incorrect assumptions. Key mechanics explained include how the proof server, wallet, and circuit interact to authorize token transfers using zero-knowledge proofs without revealing user UTXOs. The guide aims to help other builders avoid circuit failures and proof server errors by clarifying correct patterns for handling ShieldedCoinInfo and transaction balancing.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Python Web Scraping in 2026: New Libraries and Tactics to Beat Anti-Bot Systems

Web scraping with Python has changed significantly by 2026, as websites now deploy more sophisticated anti-bot measures including advanced CAPTCHAs, fingerprinting, and aggressive IP blocking. Modern scrapers rely on tools like Playwright for JavaScript-heavy sites and httpx with selectolax for static pages, replacing older solutions like Selenium. Developers are advised to randomize browser fingerprints, rotate residential proxies, and use adaptive rate limiting that slows requests when blocks are detected. Checking for hidden or public APIs before scraping is recommended as a best practice to reduce technical overhead. Legal compliance remains essential, with guidance to respect robots.txt files and limit collection to publicly available data.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Developer finds 10 false positive bugs in his own VS Code security scanner

A developer building a VS Code extension to detect leaked secrets, PII, and security vulnerabilities discovered 10 significant flaws after deliberately auditing his own tool for incorrect findings. Several bugs stemmed from overly broad regex patterns, such as flagging any 16-digit number as a credit card or any variable containing 'log' as a logging risk, regardless of actual context. Other issues were false negatives, including a private key detector that missed the widely used PKCS#8 format and an IPv6 pattern that failed to recognize compressed address notation. Fixes involved tightening pattern scope, adding validation logic like Luhn checksum checks, and anchoring detections to relevant context such as actual log-call shapes or nearby label keywords. The developer shared the root causes in detail, arguing that understanding how pattern-matching security tools fail is more valuable than simply noting that bugs were fixed.

0 comments Read more at DEV Community