Contract AI Agent Failed Three Times, Exposing Gaps Between Validation and Real Accuracy

·1 views

An enterprise legal team deployed a contract-extraction AI agent that initially showed 97% schema validation success, but broke three times in distinct ways after rollout. The first failure revealed that schema validation confirms output structure, not content correctness, after a table-formatted renewal clause caused a two-year date error that still passed validation. The second failure exposed a retry paradox, where the system filled missing fields with plausible but incorrect model-generated defaults, silently producing wrong outputs until flagged by the legal team weeks later. A third failure occurred when contracts from a newly acquired subsidiary — unseen during development — caused extraction accuracy to drop from 94% to 61%, illustrating the problem of distribution shift. The team concluded that being 'operator-ready' means an agent must handle unexpected real-world inputs reliably, not just perform well on a controlled test set.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

Dory Adds Snowflake as Native Data Source for AI-Assisted SQL Workflows

Dory, an AI-native SQL workspace, has added Snowflake as a first-class data source, enabling teams to query, explore, and collaborate on Snowflake data directly within the platform. The integration supports core database workflows including running SQL via a console, browsing schemas and tables, previewing data, and inspecting columns. Users can authenticate via password or key-pair methods, with key-pair authentication recommended for production environments as it avoids storing plain passwords. Snowflake joins Dory's existing lineup of native database connections, participating in the same workspace, query, and schema-browsing flows as other supported sources. The update is aimed at analysts, engineers, operators, and AI agents who need a unified environment to work against shared database context.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Developer submits live open source PR using AI agent at AWS event in Brasília

At AWS Community Day Brasília last Saturday, a developer ran a live workshop demonstrating AI-assisted open source contribution using an AI coding agent called Kiro. The chosen project was ScanAPI, a Brazilian Python tool for API testing that was the first Brazilian project to receive GitHub sponsorship. After finding that top candidate issues already had open pull requests, the presenter used Kiro to identify issue #916, a Docker Hub image hash dependency bug introduced in release 2.13 that left the project's build broken and vulnerable. Kiro's autonomous mode completed the full contribution workflow in roughly five minutes, from forking the repository to submitting Pull Request #1001 for maintainer review. The presenter emphasized that while AI can dramatically speed up the contribution process, technical understanding and human oversight remain essential responsibilities.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

WordPress Pre-Launch Checklist: Key Steps Before Handing Off a Site to Clients

A web developer recounts how a corporate WordPress site launch went wrong after hardcoded local development URLs caused broken buttons and missing images in production. The incident was logged as a formal company entry, highlighting how even minor oversights at handoff can permanently damage client trust. Unlike standard launch checklists that focus on functionality, a proper handoff checklist should anticipate what the client might discover that the team missed. Key areas to audit include test posts, unused or watermarked media, temporary pages, and critical settings like search engine visibility. The article provides manual, tool-free methods for each check to help developers deliver clean, professional sites to clients.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

TypeScript Cannot Auto-Infer Async Generator Input Types, Developers Must Annotate

TypeScript's async generator model relies on three explicit type parameters — T for yielded values, TReturn for the final return value, and TNext for values sent back via .next() — but the compiler cannot infer TNext automatically. This limitation exists because yield expressions are bidirectional: a generator yields values outward while callers can simultaneously send values back through the same yield point, creating an inference deadlock. Without explicit annotations, TypeScript defaults to AsyncGenerator<any, any, undefined>, leaving type safety gaps at critical boundaries in streaming, pagination, or event-processing code. Unlike regular functions where parameter types can be inferred from usage, async generators receive their 'inputs' asynchronously through an external iteration protocol, breaking standard inference patterns. Developers must therefore manually annotate all three type parameters whenever a generator consumes input through yield to ensure proper type safety across the codebase.

0 comments Read more at DEV Community