Developer uses four AI agents to audit his own code auditing website

·1 views

A developer running turva.dev, a code-auditing service, turned four Claude-powered AI agents on his own public codebase to test whether it could withstand the scrutiny it promises clients. The agents reviewed roughly 5,400 lines of source code, an MCP server, and public repository documentation, returning 91 findings ranging from a mislabeled key algorithm to a legal page misclassifying the business type. Four findings were flagged HIGH severity, but all four were ultimately dismissed after verification — two stemmed from a misread scanner scale and a cached fetch returning stale version data, while the fourth was a genuine mismatch between a no-logging promise and an active observability setting. The real logging discrepancy was fixed by disabling platform observability rather than quietly rewording the documentation, preserving the integrity of the original claim. The exercise highlighted that automated scanners, which scored the site 100/100 before and after most fixes, cannot detect inconsistencies between advertised configurations and actual deployed code.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

Agentic AI Explained: Why Governance Belongs Here, Not in Functional AI

Agentic AI refers to systems built around AI models that can take actions, call tools, trigger processes, and affect the external world — making it fundamentally different from purely functional AI. Unlike Functional AI, Agentic AI is the first system type that intersects all three authority layers: regulated, ethical, and human legitimacy frameworks. Despite appearances, these systems do not possess intent, self-awareness, or moral reasoning; they execute learned patterns within wrappers that simulate agency. When deployed in specific fields such as medicine, law, or finance, they become domain agents, though this does not grant them real understanding or intent. Governance challenges around Agentic AI centre on questions of authorisation, accountability, and constraint-setting — issues that belong to political and institutional authority, not ethics alone.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Langfuse v4 Brings Updated API for Tracing RAG Pipelines and AI Agents

A developer tutorial published on DEV Community walks through adding observability to RAG and AI agent workflows using Langfuse v4, released in March 2026. Langfuse is an open-source tool that records execution time, input/output data, API costs, and latency for each step in an AI pipeline. The guide notes that Langfuse v4 introduced significant API changes, deprecating previously used methods such as langfuse_context and update_current_trace in favour of a revised interface. Developers can instrument their code by applying the @observe() decorator to Python functions, enabling automatic tracing with minimal changes. Langfuse offers a free cloud tier at cloud.langfuse.com as well as a self-hosted deployment option, making it accessible for individual developers and teams alike.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Developer builds Linux container from scratch to speed up AI agent sandboxing

A developer is building ForkCage, an open-source Linux container project written in C++ that uses process forking to provide fast, isolated sandboxes for AI agents without the overhead of cold-starting new environments each time. The project relies on raw Linux syscalls and is primarily a learning exercise in understanding how containers work at a low level. Development revealed three notable bugs, including a deadlock caused by reading stdout and stderr sequentially rather than concurrently, which was resolved by draining both pipes simultaneously using separate threads. A second issue arose when chrooting into a fake root filesystem failed because dynamically linked binaries require shared libraries and a dynamic linker that were absent from the jail directory. The developer is continuing to extend the project and has shared the source code publicly on GitHub.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Dev Tutorial: How to Automate RAG System Quality Evaluation Using Evals

A new developer tutorial introduces 'Evals', a method for automatically measuring the quality of Retrieval-Augmented Generation (RAG) system responses instead of relying on manual review. The approach involves building an evaluation dataset of questions, expected answer keywords, and reference documents to benchmark system performance. RAG quality is assessed across three dimensions: faithfulness (no hallucinations), answer relevancy, and context recall (retrieval accuracy). The tutorial provides sample Python code using pgvector, Google Gemini embeddings, and PostgreSQL to run automated scoring. Supporting scripts for dataset definition, RAG evaluation, agent evaluation, and report generation are included in the project structure.

0 comments Read more at DEV Community