JSON-Schema token masks can silently block LLM tool calls, study finds
Researchers have found that grammar-based token masks used to enforce JSON-Schema output constraints in large language models can inadvertently block the model from emitting necessary tool-call tokens during decoding. The problem arises because schema constraints compile into masks that make function-call token sequences unreachable, causing tool invocation to silently fail even when the rest of the output is valid. A proposed two-pass inference method addresses this by running a second decoding pass without the mask, which the study reports raised the Tool Invocation Rate from 0% to 100%. The fix maintains schema compliance while restoring full tool-call functionality, though it introduces added latency that may affect real-time applications. The research was conducted on open-weight model families and does not yet cover closed-source models or complex multi-tool workflows, pointing to the need for smarter, more targeted mask designs.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.
Discussion (0)
Log in to join the discussion and vote.
Log in