Developer runs 10-day experiment coding entirely with tiny local AI models

·1 views

A software developer spent ten days testing whether small local language models — specifically Gemma 4 2B running on a Jetson Orin Nano — could replace cloud-based AI coding tools like Claude Code. The experiment revealed that roughly 60% of early failures were caused by the harness discarding correct code due to broken indentation, not actual model errors, and fixing this boosted task scores from 64 to 76 out of 100. The developer found that small models perform far better when given bounded, slot-filling tasks rather than open-ended planning, with deterministic control flow handling the overall logic. A self-review step — where the model judges its own output — was found to worsen results at this model size, suggesting such patterns require a minimum capability threshold. The findings support an emerging view that small models underperform in agentic coding tasks mainly due to thin harness design rather than fundamental model limitations.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

EU Accessibility Act forces Polish e-commerce to ditch 'div-soup' HTML by June 2025

The European Accessibility Act (EAA), set to apply to most Polish online stores from June 2025, requires compliance with WCAG 2.1 digital accessibility standards. Shops built with non-semantic HTML structures risk being inaccessible to screen readers used by blind or visually impaired customers. Failure to comply could expose businesses to legal penalties and discrimination claims under Polish consumer protection law. Experts advise developers to replace div-heavy code with native semantic HTML5 elements, arguing it is more cost-effective than patching existing code with ARIA attributes. Proper semantic markup not only ensures regulatory compliance but also broadens the potential customer base for e-commerce businesses.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Google SRE Book Distilled: A Cheat Sheet for Engineers Running Production Systems

A developer on DEV Community has published a cheat sheet summarising the Google Site Reliability Engineering (SRE) Book, a widely respected free resource on operating large-scale systems. The book, influential at companies like Netflix, Spotify, and LinkedIn, argues that operations should be treated as an engineering discipline driven by automation and objective measurement rather than manual work. Core concepts covered include error budgets, service level objectives, toil elimination, and sustainable on-call practices. The cheat sheet presents each chapter's key takeaway in a compact table format designed for quick reference rather than deep reading. It is intended for software engineers, architects, and platform engineers who want a fast overview of the book's foundational principles.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Apple launches official Safari MCP server in Technology Preview 247

Apple released its own Safari MCP server on July 1, 2026, bundled with Safari Technology Preview 247 and built by the WebKit team. The tool allows MCP-compatible AI agents such as Claude and Cursor to connect to a Safari window and perform tasks like DOM inspection, console reading, network request capture, and screenshots. It runs entirely locally via Safari's built-in WebDriver binary, with no data sent to Apple's servers. However, the server operates in an isolated automation session, meaning it has no access to a user's existing logins, cookies, or open tabs. A developer who maintains an open-source alternative called safari-mcp noted the tools serve different purposes: Apple's is designed for clean-room debugging, while theirs enables agents to interact with an already authenticated, real Safari session.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

RAG Systems Need 15 Pre-Embedding Steps, Not Just a PDF Upload

Building a production-ready Retrieval-Augmented Generation (RAG) system involves far more than uploading a document and generating embeddings. A technical walkthrough on DEV Community outlines 15 critical document ingestion steps that engineers must complete before embeddings are created. These steps include file hashing, PDF parsing, text cleaning, chunking, deduplication, versioning, and incremental ingestion, among others. Skipping any step can cause the system to return incorrect answers silently, with no obvious indication of failure. The guide emphasizes hashing file content rather than filenames to reliably detect duplicate or updated documents and avoid unnecessary reprocessing costs.

0 comments Read more at DEV Community