Silero VAD and ONNX Runtime Detect 12 Speech Segments in 14-Second Audio Clip

·1 views

A developer used the Silero VAD ONNX model with ONNX Runtime's CPU provider to detect speech in a 14.171-second two-speaker MP3 conversation. FFmpeg decoded the audio into a 16 kHz mono waveform, which was then processed in 32-millisecond chunks to generate speech probability scores. Using a detection threshold of 0.5 to open segments and 0.35 to close them, the system identified 12 distinct speech segments while discarding clips shorter than 250 milliseconds. The entire detection process completed in just 0.028 seconds on a Mac Studio, achieving a real-time factor of 0.002x. Each detected segment was saved as a separate 16-bit PCM WAV file, with the full reproducible code available in the kiarina/labs GitHub repository.

Read the full story at DEV Community

This is an AI-generated summary. ShortSingh links to the original source for the complete article.

Discussion (0)

Bilateral AI provenance standard adds agent self-signing to notarized records

A new cryptographic protocol called Bilateral Signature (v0x04) addresses a gap in AI output provenance by requiring an AI agent to sign its own work hash before an independent notary counter-signs it. Previously, the v0x03 standard only proved a hash existed at a given timestamp, but could not confirm the agent actually authored the underlying content. The updated protocol fuses the agent's Ed25519 signature into the notarized record, meaning any forgery would require compromising two separate private keys instead of one. The new version maintains the same 239-byte size, $0.01 cost, and binary layout as its predecessor, with the notary automatically selecting v0x04 when an agent signature is included in the request. All nine existing mainnet records under v0x03 remain valid without migration.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

React Explained: Virtual DOM, Components, and State Through a Mall Analogy

A DEV Community article uses a shopping mall metaphor to explain how React works internally, mapping core concepts like components, props, state, and the Virtual DOM to familiar real-world equivalents. Before React, every website update required directly manipulating the live DOM, much like rearranging a shop floor in front of customers — a slow and error-prone process. jQuery sped up these manual changes but did not eliminate the need to plan and execute each update individually. At Facebook's scale, with millions of simultaneous users triggering thousands of DOM updates per second, this approach became unmanageable. React introduced the Virtual DOM — a private design studio where changes are drafted, compared against the current state via a diffing process, and only the minimal necessary updates are applied to the real DOM.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Developer proposes 'Token Clustering' theory to explain AI reasoning failures in complex tasks

A developer who has built over 20 AI applications, including a multi-agent gold trading system and a 9-agent YouTube automation pipeline, reports persistent logical breakdowns in GPT-4o and Claude Opus during multi-step reasoning tasks. The failures are not factual errors but appear as inconsistent outputs, broken logic chains, and arithmetic mistakes embedded within larger reasoning flows. The issues became more noticeable following the GPT-4o update in May 2024 and specific Claude Opus model versions. The developer hypothesizes that pressure to increase token throughput and reduce latency may cause models to internally 'cluster' semantic groups rather than process tokens with deep sequential attention. This shortcut, termed 'reasoning-token clustering,' may prevent models from fully integrating logical dependencies across complex prompts, leading to gaps in final outputs.

0 comments Read more at DEV Community

ProgrammingDEV Community ·

Developer Documents 8 Months and ~200 Failed Experiments Seeking a Non-Neural AI Memory System

A developer has published the second installment of a research series detailing an eight-month effort to build a non-neural, CPU-only system capable of accumulating and applying experience without retraining a large language model. The project, documented through roughly 200 failed experiments, aimed to find a knowledge substrate that could change future behavior after experience, survive restarts, and generalize to unseen but related cases. The core challenge identified was not data storage — which proved straightforward — but what the researcher calls 'transferable causal transition': preserving the logic of when a learned condition-action-consequence rule should and should not be applied. Numerous candidate knowledge carriers were tested, including memory graphs, typed edges, graded vectors, and topology fields, but each preserved only one aspect of consequence while failing to generalize correctly. Surviving code from the project has been published in an open repository called AuraSDK, and the series continues to document what mechanisms held up and where the precise limits were found.

0 comments Read more at DEV Community

Silero VAD and ONNX Runtime Detect 12 Speech Segments in 14-Second Audio Clip

Discussion (0)

Related stories

Bilateral AI provenance standard adds agent self-signing to notarized records

React Explained: Virtual DOM, Components, and State Through a Mall Analogy

Developer proposes 'Token Clustering' theory to explain AI reasoning failures in complex tasks

Developer Documents 8 Months and ~200 Failed Experiments Seeking a Non-Neural AI Memory System