CLI table library splits rare kanji in half due to Unicode surrogate pair bug
A bug in the widely used cli-table3 JavaScript library caused a rare Japanese surname character, 𠮷 (U+20BB7), to be visually broken when displayed in terminal tables. The flaw resided in a one-line optimization that assumed a string was safe to cut by raw index whenever its UTF-16 code-unit length equalled its display width. The character 𠮷 is a surrogate pair, giving it a code-unit length of 2, and also an East Asian wide character, giving it a display width of 2 — two unrelated reasons that produce the same number, tricking the shortcut into firing. When the fast path sliced the string by index, it severed the surrogate pair, leaving a lone high surrogate that terminals render as a replacement box. Most common CJK characters avoid this branch entirely because their code-unit length and display width differ, meaning the bug only surfaces on this narrow class of supplementary-plane wide characters.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)
Log in to join the discussion and vote.
Log in