SShortSingh.
Back to feed

skUnit lets .NET developers test AI agents using behavior, not exact text

0
·1 views

Testing AI agents is difficult because correct responses can vary widely in wording, making traditional assertion-based unit tests unreliable. Developer Mehran Davoudi built skUnit, an open-source testing framework for .NET, to address this by verifying agent behavior through semantic conditions rather than exact string matches. The framework uses Markdown files to describe conversation scenarios and evaluates whether responses satisfy intent-based assertions, such as confirming no food was suggested to an angry user. A demo project called Moody Chef, which recommends food based on a user's mood, illustrates two architectural approaches and serves as a practical walkthrough for the framework. skUnit also supports running each test scenario multiple times to reduce false positives caused by non-deterministic model outputs.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

Log in to join the discussion and vote.

Log in

Related stories

0
ProgrammingDEV Community ·

GSoC 2026 Contributor Ships Seven webpack Website Improvements in Four Weeks

A Google Summer of Code 2026 contributor working on the webpack project merged seven pull requests between June 9 and July 3, 2026, covering a range of site improvements. Key additions include an automated governance docs fetcher, a version picker for API docs, and real landing pages for loaders and plugins that previously led to dead links. CI enhancements were also introduced, with builds now triggered on every pull request and artifacts made available for download. Security tooling was strengthened through the integration of CodeQL and zizmor scanning. A webpack release banner replaced an erroneously displayed Node.js banner, and several outstanding TODO links across the documentation were resolved.

0
ProgrammingDEV Community ·

Single Parameter Tweak in GBase 8a Triggered 10 TB Disk Write Storm in Production

A production GBase 8a cluster suffered severe performance degradation after administrators increased the group_concat_max_len parameter from 32 KB to 1 MB to meet a business requirement. A TOP-N query that normally finished in seconds began running for over three hours, while multiple other queries on the same node stalled, with some exceeding 10,000 seconds of execution time. Investigation revealed all slow queries were bottlenecked on node3, where disk utilisation hit 100% and write speeds reached 900 MB/s. The root cause was traced to the database engine typing an intermediate GROUP_CONCAT column as LONGTEXT due to the enlarged parameter, prompting the sort operation to pre-allocate up to 64 MB per row. With 200,000 rows to sort, this ballooned into roughly 12 TB of anticipated data, which spilled entirely to disk as temporary files when memory proved insufficient.

0
ProgrammingDEV Community ·

The Hidden Cost of Uncommented Code: A Developer's Tale of Inherited Chaos

A software developer was assigned what was described as a minor fix on a payment reconciliation service, only to discover a deeply undocumented codebase riddled with duplicate functions, orphaned logic, and cryptic commit messages. Key findings included two co-existing payment handler functions, a two-year-old TODO comment with no explanation, and a config flag called useNewLogic that no current team member could explain. Git history traced changes back to a now-deleted user whose commits offered messages as vague as 'idk' and 'fix bug.' The developer concluded that poor documentation rarely stems from laziness, but rather from deadline pressure and the false assumption that in-context knowledge will persist. Critical reasoning and context typically exit the codebase the moment the original developer does, leaving successors to reconstruct intent from fragments.

0
ProgrammingDEV Community ·

How One Team Cut Terraform Plan Time from 8 Minutes to 45 Seconds

A DevOps team managing over 500 Terraform resources in a single state file faced daily state-locking conflicts, frequent apply timeouts, and environment drift across dev, staging, and production. They resolved performance issues by splitting the monolithic state into six smaller, domain-specific state files — covering network, compute, data, DNS, IAM, and monitoring — each holding 60 to 100 resources. Reusable modules and Terragrunt configurations were introduced to eliminate copy-paste drift between environments. A GitHub Actions CI/CD pipeline was set up to restrict production applies to automated workflows requiring manual approval, ending ad-hoc laptop deployments. The changes reduced plan time by over 90 percent, eliminated state conflicts entirely, and brought new environment provisioning time down from two days to 30 minutes.

skUnit lets .NET developers test AI agents using behavior, not exact text · ShortSingh