Open-Source Tool SafetyDrift Detects AI Agent Data Leaks Missed by Standard Guardrails
A developer named Abhishek has released SafetyDrift, an open-source security tool designed to detect multi-step data exfiltration attacks carried out by AI agents. Unlike existing guardrails such as Guardrails AI or Lakera Guard, which evaluate each tool call in isolation, SafetyDrift tracks cumulative risk across an entire session using Markov chain analysis. The tool monitors three dimensions — data exposure, tool escalation, and reversibility — to predict the probability of a safety violation within the next five steps. It is based on a March 2026 research paper (arXiv:2603.27148) and supports major AI frameworks including LangChain, AutoGen, and CrewAI, as well as MCP-compatible agents like Claude Code and Cursor. In benchmark testing across 200 synthetic traces, SafetyDrift reported 100% F1 accuracy with zero false positives and zero successful attacks.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)
Log in to join the discussion and vote.
Log in