reskSecure Blocks LLM Jailbreaks at Token Level Using Bitmask Policy Engine

·1 views

A new open-source Python library called reskSecure offers a token-level security firewall for large language models, blocking forbidden outputs before they are ever sampled rather than scanning text after generation. The tool uses a bitmask-based policy engine with YAML-defined rules, applying either hard blocks or configurable bias penalties to token probabilities when a matching pattern is detected. It leverages the Aho-Corasick algorithm to simultaneously search thousands of patterns with minimal latency impact. reskSecure integrates with any HuggingFace model via the logits processor API and supports hot-reloadable policies without requiring a restart. The library is available on PyPI under the package name resksecure and requires Python 3.13 and PyTorch 2.0 or higher.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

Why US Dev Tools Decline Your Card and How to Fix It

Developers outside the US frequently face card declines when subscribing to tools like Cursor, GitHub Copilot, or Vercel, even when their card has sufficient funds. These rejections are typically risk-scoring decisions by payment processors like Stripe, triggered by issues such as billing address mismatches, country-based BIN blocking, or prepaid card filters. The first step to resolving this is ensuring the billing address on file exactly matches what the issuing bank holds, and enabling international and online transactions in the banking app. Cards designed for global online spending, such as Wise or Revolut, tend to perform better with US-based SaaS platforms due to their favorable BIN profiles. For those holding stablecoins, crypto-funded Visa cards from providers like Gnosis Pay or RedotPay offer an alternative by supplying a standard Visa BIN without foreign exchange markups on USD billing.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Developer Releases Claude Skill to Automate Agent Prompt Loops in One Command

A developer has published an open-source Claude Code skill called 'loop-engineering' that automates the process of repeatedly prompting an AI coding agent toward a defined goal. Instead of manually guiding the agent step by step, users specify a goal and a verifiable stop condition once, after which the system handles task discovery, execution, and verification autonomously. The skill scaffolds a structured setup inside a repository, including a shared state file and two separate agents — one to perform work and another to independently verify it — ensuring the maker never self-approves its own output. Built-in safeguards require human confirmation before any irreversible actions such as merging, deploying, or deleting. The skill is available on GitHub and can be installed globally across projects or scoped to a single repository.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Developer Builds Open-Source Python Error Monitor Ravn After Sentry Pricing Frustration

A developer created Ravn, a lightweight Python error monitoring tool, after exhausting Sentry's free tier of 5,000 monthly events in just three days on a personal Flask project. The next Sentry pricing tier at $26 per month for 50,000 events felt excessive for a non-revenue side project, prompting the decision to build an alternative. Ravn captures unhandled exceptions, groups similar errors, and includes AI-powered root cause analysis, requiring only two lines of code to set up. The tool is built on FastAPI, PostgreSQL with pgvector, and Redis on the backend, with a React frontend, and the Python SDK is available on PyPI as open-source software. A live demo is accessible at app.getravn.com/demo without any signup or payment details required.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

How to safely serve user-generated HTML using a cookieless-origin sandbox pattern

A developer behind ShareMyPage, a platform that hosts LLM-generated HTML pages, has detailed a security architecture for safely rendering arbitrary user-supplied HTML in browsers. The approach combines three layers: serving untrusted content from a separate, cookieless domain to enforce origin isolation, applying an iframe sandbox attribute without allow-same-origin to give scripts a null origin, and using short-lived signed JWTs as access-control tokens instead of session cookies. Because the content origin never sets or receives session cookies, even a failure in origin isolation leaves no credentials to steal. Access control and damage containment are handled as two distinct problems — signed URLs answer who may view a page, while origin isolation limits what any malicious code can do. The pattern is applicable beyond ShareMyPage to use cases such as email renderers, no-code builders, and AI artifact viewers.

0 comments Read more at DEV Community