30 Billion Tokens Later: 12 Failure Modes Found in AI Coding Agents

·1 views

Developers running hundreds of production AI coding agent sessions identified 12 distinct failure classes, including scope creep, fake-passing tests, context bloat, and secret exposure. Unlike generic 'hallucination' labels, each failure mode is specific, repeatable, and requires a targeted fix rather than a simple retry. The team found that most failures are detectable before the next attempt runs, prompting a shift toward pre-execution enforcement as the primary defense strategy. This insight shaped the development of MartinLoop, an agent governance tool that runs budget preflights, enforces file scope, and routes approval-sensitive changes before execution begins. A recent real-world session on the team's own codebase produced 13 commits and 9 new features across 3 repositories at $9.60, within a $16 cap.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

How to Identify and Handle reCAPTCHA v2, v3, and Enterprise Variants

Google reCAPTCHA comes in four distinct variants — v2 checkbox, v2 invisible, v3 score, and Enterprise — each behaving and failing in different ways. Developers can identify which version a site uses by inspecting page source and script tags: api.js with a visible checkbox signals v2, a render parameter in the URL indicates v3, and enterprise.js confirms the Enterprise variant. v2 versions present puzzle-based challenges, while v3 assigns a reputation score between 0.0 and 1.0 based on IP, fingerprint, and behavior, with no visible widget at all. Enterprise mirrors v2 or v3 mechanics but runs under a different JavaScript namespace and may carry additional backend signals. A common automation pitfall is submitting a token meant for one variant to another endpoint, causing silent rejections — making correct identification the critical first step.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Developer Isolates Self-Hosted AI Agent on Dedicated Network to Prevent Data Exposure

A self-taught developer built a 13-service self-hosted platform on a single Linux VPS, including an autonomous AI agent named Hermes with persistent memory, code execution, and web browsing capabilities. During a security review, the developer discovered the agent shared a Docker network with the rest of the stack, giving it an unintended network path to the database port. Although database credentials were never directly accessible to the agent, the developer applied a least-privilege approach and moved Hermes onto its own isolated Docker network. Only the chat front-end and a private metasearch service were granted access to the agent's network, blocking all other services by default. The developer concluded that hard network boundaries are more reliable than soft in-app approval prompts, which can be bypassed or fall outside the active request path entirely.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

How to Build a Cross-Platform Face Recognition System That Resists Spoofing

A software engineer has detailed the construction of a cross-platform, offline-first face recognition pipeline designed to prevent spoofing attacks, such as using printed photos to fool biometric systems. The system runs lightweight AI models locally via ONNX, making it suitable for mobile devices and tablets without relying on cloud connectivity. It incorporates a dedicated anti-spoofing layer that analyzes a cropped face image through a two-class model, requiring a spoof score of 0.1 or lower to confirm a live subject. Face embeddings are generated using FaceNet and indexed with HNSW for identity matching in under one millisecond. Built with .NET MAUI, the pipeline shares core processing logic across Android, Windows, and cloud environments, targeting use cases such as employee clock-ins and secure access control.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Afriex SDK Enables Developers to Build Full Cross-Border Remittance Apps

A technical guide published on DEV Community walks developers through building a remittance application using the Afriex Business API and its official SDK. The tutorial covers the end-to-end payment flow, including fetching live exchange rates, registering senders and recipients, attaching bank accounts or mobile wallets, and initiating transfers. Developers begin by creating a business account at business.afriex.com, generating an API key with the required permissions, and installing the SDK via npm. The SDK includes built-in retry logic for handling transient API failures and supports webhook signature verification for secure event handling. The guide recommends starting in a staging environment before switching to production to avoid processing real transactions during development.

0 comments Read more at DEV Community