SShortSingh.
Back to feed

Developer builds fully on-device video transcription and summary tool for macOS

0
·3 views

A macOS developer has detailed the technical pipeline behind Video Notes, a feature in their video player app Reel that generates timestamped transcripts, optional translations, and structured summaries from local video files without any internet connection or API keys. The four-stage pipeline uses libmpv to extract audio from a wide range of video formats including MKV and WebM, converting it to 16 kHz mono WAV files that Apple's AVFoundation cannot handle natively. macOS 26's new SpeechAnalyzer framework then processes the audio into timestamped transcript segments, with an optional translation layer supporting English-Japanese conversion. A final stage uses Apple's Foundation Models to generate a structured summary, the only step requiring Apple Intelligence to be enabled on the device. The developer shared production-level code and workarounds, noting that while Apple's 2025 APIs are impressive in demos, shipping them reliably required handling edge cases such as model downloads, empty audio tracks, and concurrent result processing.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

Log in to join the discussion and vote.

Log in

Related stories

0
ProgrammingHacker News ·

AI Data Centers Consuming Far More Water Than Tech Companies Disclose

AI data centers are using significantly more water than major technology companies publicly report, according to a Wall Street Journal investigation. The facilities require large volumes of water primarily for cooling the powerful hardware that runs artificial intelligence workloads. This hidden consumption raises concerns about environmental transparency and resource sustainability in the fast-growing AI industry. The gap between actual water usage and disclosed figures suggests that current corporate reporting practices may be inadequate for assessing the true environmental footprint of AI infrastructure.

0
ProgrammingDEV Community ·

Best UI Kits for Chrome Extensions in 2026: Why Web App Rankings Don't Apply

A developer at ExtensionBooster tested and ranked the top UI kits specifically for Chrome extensions, finding that popular choices like MUI, Chakra, and Mantine are optimised for standard web apps and often fail in extension environments. Chrome's Manifest V3 enforces strict Content Security Policies that block runtime CSS-in-JS libraries such as Emotion and styled-components from rendering correctly. Content scripts injected into third-party pages also face CSS bleed issues, making Shadow DOM compatibility a critical factor when choosing a UI kit. Bundle size is another key concern, as large component libraries can slow down a popup's first paint for end users. The author rebuilt the same extension popup, options page, and content-script overlay across multiple kits to produce a ranking tailored to these four extension-specific constraints.

0
ProgrammingDEV Community ·

Cart Timer Killing Live Payments? The Fix Is Simpler Than You Think

A recurring bug across e-commerce platforms causes customers to lose their orders when a cart reservation timer expires mid-payment, even as their bank transaction succeeds. The timer's original purpose is to manage inventory contention between undecided customers, but that logic becomes irrelevant once a buyer enters the payment flow. Engineers who treat a zero-timer as an unconditional release trigger are following system rules correctly yet ignoring the real-world outcome for the paying customer. Developers argue that a successful payment authorization should always override an expired hold, since rejecting an already-approved charge forces a refund cycle that damages customer trust without protecting any other buyer. The true exception is genuine oversell — where another customer completed purchase first — which should be handled as an inventory failure with an instant refund, not framed as the customer's error.

0
ProgrammingDEV Community ·

Docker Healthchecks Confirm Process Response, Not Application Health

Docker's HEALTHCHECK instruction periodically runs a command inside a container and marks it healthy or unhealthy based solely on the exit code returned. The mechanism does not read response bodies, parse JSON, or verify whether dependencies like databases or queues are functioning correctly. Most common implementations simply confirm that a server process is accepting TCP connections, which can mask deeper failures such as exhausted connection pools or expired API tokens. When teams equate a green 'healthy' status with full application health, they tend to overlook logs, metrics, and other diagnostic signals. The real risk lies not in what the healthcheck measures, but in the broader assumptions developers and operators build around its limited output.