OpenAI Launches Three New Voice API Models Including GPT-5-Class Realtime-2
OpenAI released three new voice models to its API on May 8, 2026: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. GPT-Realtime-2 is the standout addition, bringing GPT-5-class reasoning to live voice conversations along with a 128K context window, parallel multi-tool calling, and natural interruption handling. GPT-Realtime-Translate enables live speech translation across 70-plus input languages into 13 output languages, while GPT-Realtime-Whisper delivers streaming speech-to-text transcription in real time rather than after silence detection. Together, the three models target the most common voice pipeline architectures: conversational AI, multilingual communication, and hybrid voice-plus-text workflows. The upgraded reasoning capability in GPT-Realtime-2 is designed to address longstanding limitations of earlier Realtime API models, which struggled with complex multi-step logic and maintaining task context across extended sessions.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.
Discussion (0)
Log in to join the discussion and vote.
Log in