- This event has passed.
October DISTINGUISHED INDUSTRY SPEAKER Talk: Convolutional Beamformer for Joint Denoising, Dereverberation, and Source Separation (HYBRID)
October 17 @ 6:30 pm - 8:00 pm CDT
When speech is captured by distant microphones in everyday environments, the signals are often contaminated by background noise, reverberation, and overlapping voices. The convolutional beamformer (CBF) is a signal processing technique that recovers clean, close-microphone-quality speech from such complex mixtures. By jointly performing denoising, dereverberation, and source separation, CBF enhances both human listening experiences and automatic speech recognition (ASR) accuracy. Potential applications include hearing assistive devices, meeting transcription systems, and other real-world speech technologies. This talk begins by introducing the concept of CBF, including its formal definition, mechanism for joint enhancement, and optimization via maximum likelihood estimation. CBF is defined as a series of beamformers estimated at each frequency in the short-time Fourier transform (STFT) domain and convolved with the observed signal to achieve the desired enhancement. The presentation then describes that CBF can be factorized into Multichannel Linear Prediction (MCLP) for dereverberation and Beamforming (BF) for denoising and separation, highlighting the practical advantages of this decomposition. Related work is reviewed, including Weighted Prediction Error (WPE) dereverberation, mask-based beamforming, and guided source separation, with emphasis on strong results in challenging tasks such as the CHiME-8 distant ASR challenge. Further extensions are presented, including blind CBF for unknown recording conditions, switching CBF for enhanced performance with a limited number of microphones, and integration with neural networks – notably the DiffCBF framework, which combines CBF with diffusion-based speech enhancement models. Experimental results demonstrate state-of-the-art speech quality, even with relatively few microphones and limited training data.
Speaker(s): Tomohiro Nakatani, Ph.D.
Agenda:
6:30 – 7:00 Social half hour to grab food and drink
7:00 – 8:00 Technical talk
Room: Mann Hall, Bldg: Medical Sciences Building, 300 3rd Ave SW, Rochester, Minnesota, United States, 55902, Virtual: https://events.vtools.ieee.org/m/499193