How to transcribe audio and auto-generate podcast chapters using Whisper and GPT cheaply
A full-stack developer shared a cost-efficient method for automatically generating timestamped podcast chapters using OpenAI's Whisper and GPT models. The approach involves three steps: transcribing audio with segment-level timestamps via Whisper's verbose_json format, condensing the transcript before sending it to GPT, and caching the transcription to avoid redundant API calls. A key insight is to trim each segment to its first 120 characters before passing it to GPT-4o-mini, which drastically reduces token usage without sacrificing chapter quality. The developer notes that Whisper handles timing accurately while GPT focuses solely on generating readable titles, keeping each tool within its strength. According to the author, high AI costs are usually the result of poor orchestration and excessive context, not the choice of model itself.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.
Discussion (0)
Log in to join the discussion and vote.
Log in