Developer Builds Go Library for Semantic LLM Caching to Cut Repeated Query Costs

·1 views

A developer has released an open-source Go library designed to reduce cloud costs from repeated large language model (LLM) queries by combining deterministic hashing with vector similarity search. The tool addresses a common challenge enterprises face when scaling AI proofs-of-concept to production, where identical or near-identical user queries can generate significant API expenses. Key engineering hurdles included designing flexible cache key composition, managing concurrent background processes without memory leaks, and faithfully replaying streamed responses from cache. The library includes configurable options such as system prompt exclusion, async write-back workers, and TTL-based cleanup for both cached states and stream accumulators. Observability tooling via Prometheus metrics and a Grafana dashboard is also included, with the developer noting that the default 0.8 cosine similarity threshold may need tuning depending on real-world traffic patterns.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

Developer Builds Python Terminal Calculator With Loop and Error Handling

A developer shared their experience building a terminal-based calculator in Python using an infinite loop structure. The project features dynamic float data types and a custom exception guard to prevent crashes from zero-division errors. A key challenge encountered was Python's strict indentation rules, which caused syntax errors when nested conditionals were not properly aligned. The developer learned that consistent use of tabs or four spaces is mandatory in Python to define code blocks correctly. The project served as a hands-on introduction to core Python concepts including loops, conditionals, and exception handling.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Pre-Migration Audit Catches Schema Flaw That Would Have Charged Customers and Lost Data

A readiness audit conducted before a DNS cutover for DiagnosticPro revealed a critical database schema mismatch on the new self-hosted VPS, where twelve columns required by payment and membership workflows were missing from live tables. Had the DNS flip proceeded, every Stripe payment would have been charged successfully but the subsequent webhook handler would have thrown a server error, leaving no purchase record and no diagnostic queued. The failure was described as the worst possible kind — money collected, system broken, no recoverable trace on either side. Rather than applying a one-off manual fix, developers implemented an automated startup migration that compares the live table schema against what the code expects and applies only the missing alterations. The solution ensures any new environment the application boots against will self-correct to the required schema, preventing silent drift in future deployments.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

ChatGPT Plus and Claude Pro block cards based on billing country, not card validity

Users attempting to subscribe to ChatGPT Plus or Claude Pro may face rejections even with valid cards, because OpenAI and Anthropic restrict sales to a list of supported countries. The platforms determine a user's country using the card's issuing bank (BIN), the billing address entered on the payment form, and sometimes IP or account region. Unlike a typical fraud-based card decline, this is a policy restriction that a VPN alone cannot bypass, since the card's BIN still reveals its country of origin. The most effective workaround is to use a card physically issued in a supported country, such as those from Wise, Revolut, or certain crypto-funded Visa cards with BINs registered in eligible regions. Users must also ensure the billing country entered on the payment form matches the card's issuing country, as any mismatch is a common cause of continued failures.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Figma-to-Code AI Pipeline: Impressive Progress, But Deep Questions Remain

The design-to-code workflow is rapidly consolidating around Figma, with 82.3% of designers using it as their primary UI tool and roughly 60% of high-spending customers already using its AI codegen feature, Figma Make. A new industry standard for design tokens, DTCG, gained stable status in late 2025 and is backed by major players including Adobe, Google, Microsoft, Meta, and Figma itself. However, unguided AI models pointed at real Figma files frequently produce flawed output — hard-coded colours, invented components, and mixed framework conventions — revealing the limits of a naive pipeline. Monday.com's design-to-code implementation demonstrates the approach can work well, though it still requires developer intervention to clean up generated code. Beyond tooling and quality concerns, the author argues the more significant shift is a quieter one: a fundamental change in who performs design-to-code work and who is truly accountable for the decisions AI tools make automatically.

0 comments Read more at DEV Community