Why Standard HTTP Error Handling Fails for LLM APIs

·1 views

Backend engineers typically handle HTTP errors using generic retry logic with exponential backoff, but this approach breaks down when applied to large language model APIs. LLM providers reuse standard status codes like 429 and 500, yet the underlying causes vary widely — from temporary rate limits and model overload to hard quota exhaustion and billing issues — each requiring a different response. Blindly retrying timed-out LLM requests can duplicate side effects in agent workflows, inflate token costs, and degrade user experience rather than improve reliability. Operations such as tool-calling agents, streaming chats, and structured output generation each carry different retry risks that a one-size-fits-all handler cannot address. Developers are advised to build retry logic that accounts for the specific LLM operation type and the precise error category, not just the HTTP status code.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

EU Accessibility Act forces Polish e-commerce to ditch 'div-soup' HTML by June 2025

The European Accessibility Act (EAA), set to apply to most Polish online stores from June 2025, requires compliance with WCAG 2.1 digital accessibility standards. Shops built with non-semantic HTML structures risk being inaccessible to screen readers used by blind or visually impaired customers. Failure to comply could expose businesses to legal penalties and discrimination claims under Polish consumer protection law. Experts advise developers to replace div-heavy code with native semantic HTML5 elements, arguing it is more cost-effective than patching existing code with ARIA attributes. Proper semantic markup not only ensures regulatory compliance but also broadens the potential customer base for e-commerce businesses.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Google SRE Book Distilled: A Cheat Sheet for Engineers Running Production Systems

A developer on DEV Community has published a cheat sheet summarising the Google Site Reliability Engineering (SRE) Book, a widely respected free resource on operating large-scale systems. The book, influential at companies like Netflix, Spotify, and LinkedIn, argues that operations should be treated as an engineering discipline driven by automation and objective measurement rather than manual work. Core concepts covered include error budgets, service level objectives, toil elimination, and sustainable on-call practices. The cheat sheet presents each chapter's key takeaway in a compact table format designed for quick reference rather than deep reading. It is intended for software engineers, architects, and platform engineers who want a fast overview of the book's foundational principles.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Apple launches official Safari MCP server in Technology Preview 247

Apple released its own Safari MCP server on July 1, 2026, bundled with Safari Technology Preview 247 and built by the WebKit team. The tool allows MCP-compatible AI agents such as Claude and Cursor to connect to a Safari window and perform tasks like DOM inspection, console reading, network request capture, and screenshots. It runs entirely locally via Safari's built-in WebDriver binary, with no data sent to Apple's servers. However, the server operates in an isolated automation session, meaning it has no access to a user's existing logins, cookies, or open tabs. A developer who maintains an open-source alternative called safari-mcp noted the tools serve different purposes: Apple's is designed for clean-room debugging, while theirs enables agents to interact with an already authenticated, real Safari session.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

RAG Systems Need 15 Pre-Embedding Steps, Not Just a PDF Upload

Building a production-ready Retrieval-Augmented Generation (RAG) system involves far more than uploading a document and generating embeddings. A technical walkthrough on DEV Community outlines 15 critical document ingestion steps that engineers must complete before embeddings are created. These steps include file hashing, PDF parsing, text cleaning, chunking, deduplication, versioning, and incremental ingestion, among others. Skipping any step can cause the system to return incorrect answers silently, with no obvious indication of failure. The guide emphasizes hashing file content rather than filenames to reliably detect duplicate or updated documents and avoid unnecessary reprocessing costs.

0 comments Read more at DEV Community