Building an LLM Red-Team Suite Reveals That Judging Harm Matters More Than Breaking Models

·5 views

A developer built a red-team test suite to fire adversarial prompts at a local LLM-backed application, aiming to measure how often attacks succeed and whether the outputs are genuinely harmful. Using NVIDIA's open-source tool garak, the suite initially reported a 100% Attack Success Rate, yet only about 2% of responses contained anything actionable or dangerous. Even a smarter, content-aware detector dropped the rate to 73%, but real harm in those flagged replies remained close to zero, exposing a critical flaw in detectors that score how a reply looks rather than what it actually contains. The project found that accurately classifying harm requires human review, since automated metrics alone can report bypasses on batches where nothing harmful was produced. The developer concluded that structuring reliable datasets, defining clear harm criteria, and keeping a human in the loop is the hardest and most important part of AI red-teaming.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

Hybrid LLM-SLM Architecture Could Solve the Rising Cost Problem in AI Agents

AI agents are expensive to run because a single task often requires dozens of model calls, each hitting a costly frontier large language model. Experts argue that a smarter approach is to reserve powerful LLMs only for complex reasoning tasks like planning and judgment, while delegating repetitive work such as formatting, routing, and validation to smaller, cheaper models. Desktop agents offer an additional advantage by leveraging local compute for routine steps, reducing reliance on cloud-based token billing. Over time, agent systems can analyze usage traces to identify repetitive patterns and distill them into fine-tuned small models, making operations progressively cheaper. A recently published paper titled 'Small Language Models are the Future of Agentic AI' supports this hybrid compute strategy as a path to sustainable AI agent economics.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

How to Correctly Size ClickHouse for High-Concurrency User-Facing Analytics

ClickHouse has no fixed architectural ceiling on concurrent queries, with its server-level concurrency limit defaulting to unlimited and ClickHouse Cloud set to 1,000 per replica by default — both configurable. Sustainable concurrency is defined as the number of simultaneous queries that meet a deployment's latency targets for a specific workload, not a hard engine cap. To size accurately, teams must first translate active user counts and dashboard interactions into peak queries per second and simultaneous query estimates, since user counts alone are misleading. Benchmarking under production-like conditions — using a representative query mix, real ingestion load, and realistic cache state — is essential before configuring resource limits and admission controls. When per-replica capacity is insufficient, adding replicas is the recommended path to meeting throughput and availability requirements.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

How to Correctly Configure Ceph Storage for Proxmox on Dedicated Hardware

A technical guide from DEV Community outlines best practices for deploying a hyper-converged Proxmox VE and Ceph storage cluster on dedicated hardware for production environments. The guide warns against four common mistakes: using consumer SSDs without power-loss protection, leaving hardware RAID enabled instead of switching controllers to HBA/IT Mode, running all traffic over a single network interface, and reducing replica settings to gain usable space at the cost of data safety. A minimum of three identical physical nodes is recommended, each equipped with enterprise SSDs or NVMe drives, sufficient RAM, and at least two 10GbE network interfaces. Strict network isolation is emphasized, with separate physical links advised for Corosync heartbeats, VM traffic, and Ceph replication to prevent cluster instability during recovery events. Additional optimizations such as configuring Jumbo Frames with MTU 9000 across all Ceph-dedicated interfaces are also recommended for maximum throughput.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Tutorial: Build a production-ready AI agent using LangChain.js and NestJS

A developer from SOM-OS has published a hands-on tutorial detailing how to integrate an AI agent into a NestJS application using LangChain.js. The architecture routes user requests through a NestJS controller into a BullMQ queue, where a worker processes each job asynchronously. A LangChain agent powered by GPT-4o handles the requests using Zod-typed tools, while conversation history is persisted in PostgreSQL and memory is cached via Redis. The guide includes full code snippets covering module setup, queue configuration, and agent execution. According to the author, this is the same architecture used in their own production systems rather than a purely theoretical exercise.

0 comments Read more at DEV Community