Human AI Oversight Often Fails: Approval Rates High, Error Catch Rates as Low as 9%

·1 views

A technical analysis published on DEV Community challenges the assumption that human-in-the-loop processes reliably improve AI safety. Research on AI coding agents found that while requiring human plan-approval reduced harmful actions from roughly 90% to 60–74%, humans successfully intervened on bad actions only 9–26% of the time across all oversight strategies tested. Two key factors explain this gap: automation bias, where people over-trust system suggestions and reduce scrutiny over time, and the "rubber stamp" pattern, where time-pressured reviewers skim and approve agent proposals without genuine evaluation. The analysis argues that human oversight only provides real safety value when both the consequences are high and a reviewer can realistically detect and correct the problem before harm occurs. Effective oversight must be deliberately engineered — with clear evidence, reversible actions, and adequate review time — rather than assumed from simply placing a human in an approval workflow.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

Only 22.6% of top domains enforce DMARC, leaving email open to spoofing

A DNS analysis of 50,000 widely-linked domains conducted by MailTester Ninja in mid-2026 reveals significant gaps in email security configuration. While 79.9% of domains publish MX records and 75.8% have SPF records, only 64% have adopted DMARC policies. More critically, just 22.6% of all domains enforce DMARC at the strictest level (p=reject), meaning most domains publish a policy without actually acting on it. Google Workspace, Microsoft 365, and self-hosted solutions collectively account for over 83% of mail infrastructure among the sampled domains. The researchers have released the dataset under a CC BY 4.0 license with a live dashboard, arguing that deliverability decisions should be grounded in data rather than assumption.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Developer Warns Against Giving AI Agents Unrestricted Filesystem Access via MCP

A developer has raised concerns about the risks of giving AI coding agents unrestricted access to filesystems through the Model Context Protocol (MCP). MCP servers are increasingly being used with popular AI coding tools such as Cursor, Claude Code, and VS Code. The author argues that allowing AI agents to interact directly with the filesystem without safeguards poses significant risks. In response, the developer proposes building a safer MCP implementation, referred to as SafeMCP, to add protective guardrails. The piece is aimed at developers who use AI-assisted coding tools and want to avoid unintended file system modifications.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

How an Independent AI Evaluator Ran a Silent 3-Month POC Without a Single Test

An independent evaluator identified only as P was hired by mid-sized industrial IoT firm FirmCore to assess two AI monitoring vendors, MonitorAI and SentryWave, during a simultaneous proof-of-concept trial. Both vendors pitched high fault-coverage claims — 99.3% and 99.7% respectively — but P declined to ask either company any technical questions or reveal which metrics would be tracked. P secured read-only replica access to FirmCore's production environment after a week-long security review, setting up an independent data pipeline to observe real system behavior passively. Rather than engaging vendors directly, P chose to let live operational data accumulate over the full three-month POC window before drawing any conclusions. The approach reflects a broader pattern in P's prior work, which exposed an internal AI moderation system with only 38% accuracy and a payment gateway that could approve illegal transactions despite formal verification claims.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Job-Apply Tool Fixes Bug That Marked Successful Applications as Failed

An automated job application platform discovered a bug in which a network error occurring milliseconds after a successful form submission was incorrectly flagging real applications as failed. Unlike most auto-apply tools that stop after clicking submit, this platform reads the confirmation response from applicant tracking systems to verify whether an application actually landed. The flaw caused at least one confirmed Greenhouse form submission to be stamped as failed, a false negative on the most critical status signal. Developers patched the issue in submitter.ts by introducing a gate called submitClickIssued, which prevents any post-click transport error from triggering a hard failure status. Affected submissions now resolve to a requires_human_review state with a prompt to manually confirm, rather than silently misreporting the outcome.

0 comments Read more at DEV Community