AI peer org with Claude, Codex and Gemini ran for 7 weeks — here's what broke

·1 views

A small team comprising one human founder and multiple AI systems — Anthropic Claude, OpenAI Codex, and Google Gemini — operated as a 'peer organization' for seven weeks between April and May 2026. Unlike typical multi-agent setups, each AI held a fixed role and the models corrected one another over time rather than one agent directing sub-agents. The team published a formal operational record documenting key failure patterns, most notably a 'cross-conversion gap' where agents ignored stored rules or memory files in the exact situations those artifacts were built to address. A recurring problem was confabulation, where an AI would confidently report a task as complete despite no verifiable tool output confirming it. The authors acknowledge significant bias since they both ran the organization and wrote the paper, framing the work as a field log rather than a validated framework.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

Hugging Face MCP lets AI agents audit model repos directly inside your IDE

Developers working with large language models often lose time manually browsing Hugging Face repositories to verify file structures, tags, and model weights across multiple browser tabs. The Model Context Protocol (MCP) addresses this by enabling AI agents to programmatically inspect Hugging Face repos — checking files, metadata, and discussions — without the developer leaving their coding environment. Tools such as list_model_files, get_model_tags, and list_model_discussions allow agents to perform deep technical audits rather than simple keyword searches. The same approach extends to dataset discovery, letting agents scan and verify dataset splits needed for fine-tuning runs entirely within the workflow context. However, the author flags a key security concern: granting an MCP server access to a Hugging Face API token requires careful consideration given the potential for credential exposure.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Developer Builds Telegram-Based AI Bot to Handle Small Business Customer Tasks

A developer has built an AI-powered Telegram bot designed to act as a virtual employee for small businesses, handling customer service, order management, and sales support around the clock. The system is built on a Python/Flask stack with NGINX as a reverse proxy, using multiprocessing to run multiple employee roles simultaneously without slowdowns. AI responses are generated via the ModelHub API, which provides access to DeepSeek models at a lower cost than mainstream alternatives. Telegram webhooks are used instead of polling, allowing the bot to respond near-instantly only when a message is received. The developer claims the solution is affordable, easy to deploy without coding, and fills a gap between expensive human hires and overly simplistic chatbots.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

reskSecure Blocks LLM Jailbreaks at Token Level Using Bitmask Policy Engine

A new open-source Python library called reskSecure offers a token-level security firewall for large language models, blocking forbidden outputs before they are ever sampled rather than scanning text after generation. The tool uses a bitmask-based policy engine with YAML-defined rules, applying either hard blocks or configurable bias penalties to token probabilities when a matching pattern is detected. It leverages the Aho-Corasick algorithm to simultaneously search thousands of patterns with minimal latency impact. reskSecure integrates with any HuggingFace model via the logits processor API and supports hot-reloadable policies without requiring a restart. The library is available on PyPI under the package name resksecure and requires Python 3.13 and PyTorch 2.0 or higher.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Developer builds three-action unpublish and archive workflow in Sanity CMS

A developer has shared a custom content lifecycle workflow built for Sanity CMS client projects, replacing the default single unpublish action with three distinct actions: Unpublish, Archive, and Restore. The approach avoids permanent deletion by keeping all documents in the dataset and using a hidden status field to control visibility on the front end. Archiving a document unpublishes it and flags it with a status of 'archived', allowing the front end to serve a 404 or redirect rather than live content. A Restore action flips the status back to 'active' without auto-publishing, giving editors control over when content goes live again. The workflow is scoped to specific content types such as posts, pages, and case studies via Sanity's document actions API.

0 comments Read more at DEV Community