Engineer runs 10-day experiment coding entirely on tiny local AI models
A software developer spent ten days testing whether small local AI models — specifically a 2-billion-parameter Gemma model running on a Jetson Orin Nano — could replace cloud-based coding assistants like Claude Code. The experiment revealed that roughly 60% of early failures were caused by the harness discarding correct code due to broken indentation, not by the model itself being incapable. Fixing that single parsing issue raised the benchmark score from 64 to 76 out of 100. The developer also found that small models perform far better when given bounded, slot-filling tasks rather than open-ended planning, and that self-review loops — where the model judges its own output — actually degraded performance at this scale. The findings suggest that thin tooling around small models, rather than the models themselves, is often the primary bottleneck in agentic coding tasks.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)
Log in to join the discussion and vote.
Log in