Prompt Cache Placement Can Cut AI Agent Token Costs by Up to 80%

·1 views

Research highlighted by LangChain and Focused Labs reveals that the structural ordering of content within an AI agent's prompt has major consequences for cost and performance. Prompt caching works by matching stable prefixes, meaning any volatile element—such as a timestamp, session ID, or request metadata—placed near the top of a prompt can break cache hits entirely. LangChain's Deep Agents evaluation found that provider-aware prompt caching reduces average token costs by 49% to 80% when implemented correctly. The core principle is that stable content like system instructions, tool schemas, and static policies must appear before dynamic content like user input, retrieved snippets, or tool outputs. Common development decisions made independently—such as prepending a request ID or reordering a tool registry—can collectively destroy cache efficiency and silently inflate inference costs over time.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

How One Dev Fixed Months of Wrong-Language Emails Using Stripe's Currency Field

A SaaS product unknowingly sent Japanese-language emails to English-speaking overseas customers for months after all four Stripe-triggered email types were hardcoded to Japanese. The development team evaluated three approaches to fix the language detection issue inside Stripe webhooks, including storing a language column in the database, querying the Stripe API, or inferring language from the currency field already present in the webhook payload. They chose currency-based inference because it requires no database migration, no extra API calls, and automatically applies the correct language to both new and existing users. A simple helper function maps USD to English and defaults all other currencies to Japanese, with room to expand for EUR or GBP markets later. The team also encountered a subtle PHP mb_language configuration trap during implementation that nearly undermined the fix.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

PDF Toolkit API lets developers merge, split, and watermark PDFs via HTTP calls

A lightweight HTTP API called PDF Toolkit has been introduced to handle common PDF operations without requiring native libraries like Ghostscript or pdftk. The API exposes six endpoints covering merging, splitting, rotating, watermarking, metadata extraction, and image-to-PDF conversion. Developers can integrate it using simple cURL commands or Node.js code by sending POST requests with file attachments. The service is available on RapidAPI with a free tier offering 100 requests per day and no credit card required to start. It is aimed at developers working in serverless or restricted hosting environments who want a single integration instead of managing multiple PDF libraries.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Mistral and open-source MinerU race to make PDFs readable for AI

French AI company Mistral launched an updated hosted document-reading service on June 25, 2026, claiming state-of-the-art accuracy in converting complex PDFs into clean, structured text. Around the same time, the open-source project MinerU gained significant traction on GitHub by offering a self-hosted, free alternative that processes PDFs and office files into AI-ready formats. Both tools tackle document intelligence, the process of extracting properly ordered, structured text from scanned contracts, multi-column papers, and table-heavy invoices that standard text extraction cannot handle. The quality of this conversion matters because AI systems built on top of poorly parsed documents will produce unreliable outputs, with errors occurring invisibly before any language model is even involved. The two tools represent a broader industry tension between convenient, paid cloud services and free, privacy-preserving tools that organisations run on their own infrastructure.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

The Hidden Risk in Codebases: Behavior With No Documented Proof

As software systems age, a dangerous gap grows between how code actually behaves and what the repository can formally prove about that behavior. Critical logic — such as fraud rules, retry handling, or edge-case workarounds — often exists only in a developer's memory, an old Slack thread, or a long-forgotten pull request comment. Tests help, but they only validate what someone remembered to assert, leaving many real user-facing behaviors entirely unprotected. The rise of AI coding tools has sharpened this risk, as agents can silently simplify or remove undocumented logic while tests continue to pass. The author argues that missing behavioral evidence should be treated as a warning signal, and that code reviews must ask not just whether code looks correct, but what behavior it claims to preserve and where the proof lives.

0 comments Read more at DEV Community