Using OpenAI's tiktoken for Claude Token Counts Can Skew Cost Estimates by 20%

·1 views

Developers using OpenAI's tiktoken tokenizer to estimate token counts for Anthropic's Claude models risk cost and context budget errors of 15–20%, and more on code or non-English text. This happens because Claude uses its own tokenizer, which splits text differently than tiktoken, causing systematic undercounting. Anthropic provides a dedicated countTokens API endpoint in its SDK that returns accurate, model-specific token counts before inference is run. Token counts also vary across Claude model versions, meaning cached counts from older models should not be reused when switching versions. The recommended fix is to always call countTokens against the specific Claude model being used, and never apply a blanket multiplier to convert counts between models.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

How Idempotency Keys Stop Social Media Automation From Double-Posting

Scheduled social media posts can accidentally publish twice when a network timeout causes a retry before the original request completes, a problem that can embarrass brands and damage agency credibility. This failure stems from an ambiguous timeout state in distributed systems, where the client cannot tell whether a request succeeded, failed, or never arrived. Idempotency keys solve this by assigning a unique identifier to each request, allowing servers to recognize retries and return the original result without re-executing the action. When APIs like X's posting endpoints do not natively support idempotency keys, developers must implement client-side deduplication by durably recording the intent to post before any request is sent. HelperX, which manages scheduled posts across hundreds of accounts, uses this approach to ensure every social action takes effect exactly once regardless of network conditions.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Kafka Partitioning Strategies Engineers Must Plan Early to Avoid Costly Failures

Kafka partitions define the unit of parallelism and ordering in event streaming systems, yet many engineers overlook partitioning decisions until production problems emerge. The partition count sets a hard ceiling on consumer parallelism, meaning idle consumers and processing bottlenecks can result from under-provisioned topics. Key-based partitioning, which uses a hashed key to route events to a consistent partition, is the recommended default when event ordering for a specific entity matters. However, poorly chosen keys with low cardinality — such as country code or status — can create hot partitions where one consumer is overwhelmed while others sit idle. Keyless round-robin partitioning offers even distribution and higher throughput but sacrifices ordering guarantees, making it suitable only for workloads like logs or metrics where sequence is irrelevant.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Developer Builds AI Exam Prep App in 8 Months After Frustration with Rote Study Methods

A computer science student built ExamIntelligence, an AI-powered exam preparation app, after growing frustrated with exams that reward pattern recognition over genuine learning. The project began as a rushed, vibe-coded prototype using the Gemini API and Streamlit just days before his preliminary exams. After prelims, he discovered the AI-generated codebase was riddled with errors, prompting a full rebuild from scratch in Neovim with a more disciplined, architecture-first approach. He developed a hybrid AI pipeline, benchmarking multiple PDF parsers and testing local language models before settling on a multimodal solution to handle complex documents. The app, now live at examintelligence.app, aims to parse past papers and mark schemes to help students study more efficiently and free up time for deeper, curiosity-driven learning.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Developer Builds Self-Healing OS Kernel That Uses a Local LLM to Recompile Itself

A developer has completed a 12-part project to build V.E.L.O.C.I.T.Y.-OS, a bare-metal operating system designed to run entirely within a CPU's L3 cache. The final phase introduces a self-healing loop in which a Ring 0 telemetry system monitors JIT execution speeds using the CPU's Time Stamp Counter. When performance degrades beyond a set threshold, the kernel feeds the affected module's abstract syntax tree and performance logs to a locally running Qwen-Coder-0.5B language model. The model then generates optimized code candidates, sandboxes them for safety, and hot-swaps them into memory without restarting the system. The project also includes a Biosphere P2P registry and a Boot-to-NDA LLM Terminal handover, completing the autonomous self-optimization pipeline.

0 comments Read more at DEV Community

Using OpenAI's tiktoken for Claude Token Counts Can Skew Cost Estimates by 20%

Discussion (0)

Related stories

How Idempotency Keys Stop Social Media Automation From Double-Posting

Kafka Partitioning Strategies Engineers Must Plan Early to Avoid Costly Failures

Developer Builds AI Exam Prep App in 8 Months After Frustration with Rote Study Methods

Developer Builds Self-Healing OS Kernel That Uses a Local LLM to Recompile Itself