Developer runs 10-day experiment coding entirely with tiny local AI models
A software developer spent ten days testing whether small local language models — specifically Gemma 4 2B running on a Jetson Orin Nano — could replace cloud-based AI coding tools like Claude Code. The experiment revealed that roughly 60% of early failures were caused by the harness discarding correct code due to broken indentation, not actual model errors, and fixing this boosted task scores from 64 to 76 out of 100. The developer found that small models perform far better when given bounded, slot-filling tasks rather than open-ended planning, with deterministic control flow handling the overall logic. A self-review step — where the model judges its own output — was found to worsen results at this model size, suggesting such patterns require a minimum capability threshold. The findings support an emerging view that small models underperform in agentic coding tasks mainly due to thin harness design rather than fundamental model limitations.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.
Discussion (0)
Log in to join the discussion and vote.
Log in