Benchmark Compares Four Local Voice Cloning Models Across Five Languages
A developer has published an engineering benchmark evaluating four local voice-cloning models — OmniVoice int8, Chatterbox Multilingual fp16, VoxCPM2 bf16, and Fish Audio S2 Pro fp16 — across English, German, Arabic, Spanish, and Mandarin Chinese. The benchmark used Google FLEURS reference audio and measured speaker similarity, word/character error rates, audio length, and real-time factor. OmniVoice emerged as the strongest overall performer, while VoxCPM2 excelled specifically at Arabic speaker matching. Fish Audio S2 Pro showed high similarity scores for German and Arabic but lagged in processing speed, and Chatterbox Multilingual performed competitively on Arabic and Spanish. The study is an engineering comparison of model behavior within a single local speech stack, not a human perceptual evaluation.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)
Log in to join the discussion and vote.
Log in