Snorkel AI Releases Senior SWE-Bench to Test AI Agents on Complex Engineering Tasks

·1 views

Snorkel AI has launched Senior SWE-Bench, an open-source benchmark designed to evaluate AI coding agents at a senior software engineer level. The tool raises the difficulty bar beyond existing benchmarks by presenting agents with more complex, real-world engineering challenges. It aims to provide a more rigorous and meaningful measure of AI capability in software development. The benchmark was shared on Hacker News, where it drew initial community attention. By open-sourcing the tool, Snorkel AI invites researchers and developers to use and contribute to the evaluation framework.

Read the full story at Hacker News

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

System Design Interviews: Why Framework Matters More Than Tool Knowledge

A tutorial published on DEV Community uses a fictional uncle-nephew dialogue to explain why many experienced engineers struggle with system design interview questions. The core argument is that failing candidates suffer from a framework problem rather than a knowledge gap — they jump to technology choices before understanding the problem. The guide proposes a 12-step methodology grouped into three phases: Understand, Design, and Robustify, meant to be applied consistently across any system design prompt. The approach emphasizes asking clarifying questions first, estimating scale before selecting tools, and separating functional from non-functional requirements. The author contends that mastering this fixed sequence allows engineers to tackle any 'design X' question with structure and confidence.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Dev builds browser-only toolkit after accidentally exposing production credentials online

A developer built a suite of privacy-focused tools after realising they had unknowingly sent production database credentials to an unknown third-party server via an online .env converter. Investigating other commonly used tools revealed a similar pattern: thin frontends masking backend processing with no transparency about data retention. The resulting toolkit, available at configdev.com, includes an env converter, crontab-to-systemd converter, CIDR calculator, PII log scrubber, and CSV-to-JSON Schema builder. All processing runs entirely in the browser, meaning no data is transmitted to external servers, which users can verify by checking the network tab or going offline mid-session. The project is in its early stages with few users so far, but the developer has made it publicly available and is open to feedback.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

AI Chatbots Are Sending Paying Customers to Businesses That Can't Track Them

An Israeli AI-automation agency, Automaziot AI, discovered that AI assistants like ChatGPT and Perplexity had been quietly referring customers to their business since mid-May 2026, generating at least nine tracked web leads plus additional phone inquiries. Two of those leads converted into paying clients worth a combined ₪35,000 (roughly $9,300), yet the company's CRM had misclassified nearly all of them as 'website' or 'unknown' traffic. A key example involved a window-cleaning business owner who phoned the agency after an AI assistant recommended them, closed a deal, and paid — all within the same day, leaving no digital attribution trail. Standard CRM attribution systems, built around paid-click identifiers and form submissions, are structurally unable to capture referrals that originate from AI assistants, especially when the next step is a phone call. The agency found that AI-referred leads also showed the highest inbound-to-outbound message engagement ratio of any acquisition source in their CRM, suggesting meaningful buyer intent rather than casual browsing.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Linux DMA Mapping API: How Coherent and Streaming Mappings Differ

The Linux DMA mapping API helps driver authors translate CPU buffer addresses into bus addresses usable by devices, while also handling cache maintenance on non-coherent architectures. Two primary mapping types exist: coherent mappings for small, long-lived control structures that require no explicit syncing, and streaming mappings for bulk data transfers that must be explicitly synced if the CPU accesses the buffer mid-transfer. A key challenge is that DMA spans three distinct address spaces — kernel virtual, CPU physical, and device bus addresses — which are not interchangeable and cannot be used in place of one another. On non-coherent embedded SoCs, incorrect cache handling can cause data corruption that appears only on ARM targets but not on x86 systems, making bugs notoriously difficult to diagnose. The API abstracts these architecture-specific cache operations, and tools like CONFIG_DMA_API_DEBUG can help validate correct usage of map and unmap calls in driver code.

0 comments Read more at DEV Community