Jump to content
Home

News

Voice activity detection research wins best paper award by ISCA

Published online: 30.08.2023

Professor Zheng-Hua Tan is awarded yet another accolade for his work. The International Speech Communication Association has awarded the Best Research Paper accolade to an innovative study that introduces a novel approach to voice activity detection.

News

Voice activity detection research wins best paper award by ISCA

Published online: 30.08.2023

Professor Zheng-Hua Tan is awarded yet another accolade for his work. The International Speech Communication Association has awarded the Best Research Paper accolade to an innovative study that introduces a novel approach to voice activity detection.

By Mads Sejer Nielsen

The paper, titled " rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method," by Zheng-Hua Tan, Achintya kr. Sarkar, and Najim Dehak presents an innovative technique for discerning human speech segments within audio recordings, offering significant advancements for voice activity detection (VAD).

“I am happy and honored that our work has been recognized and I look forward to working on similar projects,” said professor at Department of Electronic Systems at Aalborg University, Zheng-Hua Tan.

One notable feature of this technique is its applicability to a wide array of applications. From speech recognition and speaker identification to age and gender identification, self-supervised learning, human-robot interaction, and audio archive segmentation, the method serves as a versatile preprocessor enhancing various speech-related processes.

”In this work we developed a voice activity detection technique. While many methods require to acquire large amount of data to train their models, our method does not. Our method was devised based on our in-depth understanding of human speech. The advantages of this are that it can be used in a plug-and-play way, and it works very well in challenging acoustic conditions,” said the professor.

The research, conducted while Zheng-Hua Tan visited the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT, introduces an unsupervised segment-based method for robust voice activity detection (rVAD). The work is published in Computer Speech & Language, vol. 59, 2020.

The method presents a two-pass denoising strategy followed by a VAD stage. In the initial pass, the approach identifies high-energy segments within a speech signal using an innovative measure named a posteriori signal-to-noise ratio (SNR) weighted energy difference. Segments without discernible pitch are classified as high-energy noise segments and subsequently muted.

In the second denoising pass, the speech signal is enhanced using speech enhancement techniques, explored across several methodologies. To create the voice activity detector, neighboring frames with detectable pitch are amalgamated into pitch segments.

These segments are then extended based on speech statistics, encompassing both voiced and unvoiced sounds, along with plausible non-speech elements. The final step involves applying a posteriori SNR weighted energy difference to the extended pitch segments within the denoised speech signal to detect voice activity.

To promote further collaboration and advancement in the field, the source code for the rVAD method has been made publicly accessible in GitHub, allowing researchers and developers to leverage this innovative technology for diverse applications.