SShortSingh.
Back to feed

Developer builds open-source proxy that cut Claude Code token costs by half

0
·2 views

A developer discovered that the bulk of Claude Code API costs came not from prompts or responses, but from overhead such as redundant tool schemas and verbose JSON payloads sent with every request. To address this, they built an open-source proxy called Lynkr that strips unnecessary tool definitions, compresses large outputs like grep results, and caches semantically similar queries. The proxy also routes requests by complexity, sending simple questions to free local models and reserving paid cloud APIs only for demanding tasks like architecture reviews or security analysis. In the developer's own sessions, 70–90% of requests were handled locally without hitting a paid backend. The tool is available via npm and works with Claude Code, Cursor, and Codex CLI by overriding the API base URL.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

Log in to join the discussion and vote.

Log in

Related stories

0
ProgrammingDEV Community ·

Developer Uses ONNX Runtime and Pyannote 3.0 to Split Two-Speaker Audio Into Segments

A developer has demonstrated how to detect speaker changes in a two-person audio conversation using an ONNX version of the Pyannote Segmentation 3.0 model running on CPU via ONNX Runtime. The experiment uses FFmpeg to decode a roughly 14-second MP3 recording into a 16 kHz mono waveform, which is then processed in 10-second windows to identify where one speaker gives way to another. The pipeline successfully separates six alternating utterances into six individual WAV files while maintaining consistent speaker indexing throughout. Post-processing steps handle silence, brief fluctuations, and potential overlapping speech using probability thresholds and minimum segment duration rules. The author notes this is not a full diarization pipeline, as it relies on the model's internal speaker indexes rather than embedding comparison or clustering across longer recordings.

0
ProgrammingDEV Community ·

Developer Uses Custom BMAD Skill to Automate Multi-Story Implementation Loop

A developer working with the BMAD AI-assisted development framework identified a key pain point: the repetitive, manual approval steps required during the implementation phase. While exploring alternatives like the Superpowers framework, the developer found BMAD superior for large, complex projects due to its structured planning stages covering brainstorming, PRD, UX, and architecture. To address the automation gap, the developer devised a custom 'skill' that instructs a main agent to spawn independent sub-agents for each development story, running them in a loop until all stories are complete. This approach avoids context bloat by keeping each story implementation in a separate session, and is designed to pause only when a genuine blocker is encountered. The solution eliminates the need for a Python orchestration script and leverages BMAD's own agent capabilities to streamline the end-to-end development workflow.

0
ProgrammingDEV Community ·

Why AI Search Engines Often Ignore Well-Ranked Single-Page Web Apps

Single-page applications built with frameworks like React or Vue may be effectively invisible to AI crawlers, which typically do not execute JavaScript and only read raw server-sent HTML. A 2025 study found that around 68 percent of pages cited in AI Overviews did not appear in the top ten organic Google search results, highlighting that ranking and being cited by AI are now distinct challenges. This means a web app can hold a top Google ranking yet still be absent from AI-generated answers, or worse, be misrepresented by AI models pulling incomplete data. Traditional SEO guidance, largely aimed at content teams managing blogs, does not address the technical needs of developers building modern web apps for AI discoverability. Developers are being advised to provide cleaner, machine-readable content structures so that AI systems can accurately represent their products in search responses.

0
ProgrammingDEV Community ·

Why 'Loop Engineering' in AI Works Now but Struggled in 2022–23

Loop engineering — where an AI agent repeatedly generates, tests, and refines its output until a problem is solved — is not a new concept, but earlier language models lacked the consistency and context capacity to make it practical. Older models frequently misunderstood feedback, repeated mistakes, and lost track of prior attempts as limited context windows forced older messages to be dropped. Advances in three key areas have changed this: larger context windows let models retain full iteration histories, improved consistency means models now converge on correct solutions rather than producing varied half-baked answers, and better tool integration allows models to actually run code and read real error outputs. Falling inference costs have also made running multiple back-to-back iterations economically viable, whereas in 2022–23 the compute expense discouraged casual looping. Tools like Claude Code automate this generate–verify–retry cycle, but the core mechanic remains a simple feedback loop that any user can replicate manually with any capable model.