Developer builds fully on-device video transcription and summary tool for macOS
A macOS developer has detailed the technical pipeline behind Video Notes, a feature in their video player app Reel that generates timestamped transcripts, optional translations, and structured summaries from local video files without any internet connection or API keys. The four-stage pipeline uses libmpv to extract audio from a wide range of video formats including MKV and WebM, converting it to 16 kHz mono WAV files that Apple's AVFoundation cannot handle natively. macOS 26's new SpeechAnalyzer framework then processes the audio into timestamped transcript segments, with an optional translation layer supporting English-Japanese conversion. A final stage uses Apple's Foundation Models to generate a structured summary, the only step requiring Apple Intelligence to be enabled on the device. The developer shared production-level code and workarounds, noting that while Apple's 2025 APIs are impressive in demos, shipping them reliably required handling edge cases such as model downloads, empty audio tracks, and concurrent result processing.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.
Discussion (0)
Log in to join the discussion and vote.
Log in