Joint acoustic localization and dereverberation through plane wave decomposition and sparse regularization
Abstract
Acoustic source localization and dereverberation are formulated jointly
as an inverse problem.
The inverse problem consists of the approximation of the sound field measured by a set of microphones.
The recorded sound pressure is matched
with that of a particular acoustic model
based on a collection of plane waves
arriving from different directions
at the microphone positions.
In order to achieve meaningful results,
spatial and spatio-spectral sparsity
can be promoted in
the weight signals controlling the plane waves.
The large-scale optimization problem
resulting from the inverse problem formulation
is solved using a first order optimization algorithm combined with a weighted overlap-add procedure.
It is shown that once the weight signals capable of effectively approximating the sound field are obtained, they can be readily used to localize a moving sound source in terms of direction of arrival (DOA) and to perform dereverberation in a highly reverberant environment.
Results from simulation experiments and from real measurements show that the proposed algorithm is robust against both localized and diffuse noise exhibiting a noise reduction in the dereverberated signals.
Simulations: scenario (i) sensor noise (\(40\) dB SNR) SBL
\(N_m = 4\)
\(N_m = 8\)
\(N_m = 12\)
\(N_m = 16\)
\(N_m = 20\)
\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{p}}_t\)
Simulations: scenario (i) sensor noise (\(40\) dB SNR) ADA
\(N_m = 4\)
\(N_m = 8\)
\(N_m = 12\)
\(N_m = 16\)
\(N_m = 20\)
\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{p}}_t\)
Simulations: scenario (ii) diffuse babble noise (\(10\) dB SNR) ADELFI \(l_1\)
\(N_m = 4\)
\(N_m = 8\)
\(N_m = 12\)
\(N_m = 16\)
\(N_m = 20\)
\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)
Simulations: scenario (ii) diffuse babble noise (\(10\) dB SNR) ADELFI \(\Sigma l_2\)
\(N_m = 4\)
\(N_m = 8\)
\(N_m = 12\)
\(N_m = 16\)
\(N_m = 20\)
\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)
Simulations: scenario (ii) diffuse babble noise (\(10\) dB SNR) SBL
\(N_m = 4\)
\(N_m = 8\)
\(N_m = 12\)
\(N_m = 16\)
\(N_m = 20\)
\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{p}}_t\)
Simulations: scenario (ii) diffuse babble noise (\(10\) dB SNR) ADA
\(N_m = 4\)
\(N_m = 8\)
\(N_m = 12\)
\(N_m = 16\)
\(N_m = 20\)
\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{p}}_t\)
Simulations: scenario (iii) localized white noise (\(15\) dB SNR) ADELFI \(l_1\)
\(N_m = 4\)
\(N_m = 8\)
\(N_m = 12\)
\(N_m = 16\)
\(N_m = 20\)
\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)
Simulations: scenario (iii) localized white noise (\(15\) dB SNR) ADELFI \(\Sigma l_2\)
\(N_m = 4\)
\(N_m = 8\)
\(N_m = 12\)
\(N_m = 16\)
\(N_m = 20\)
\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)
Simulations: scenario (iii) localized white noise (\(15\) dB SNR) SBL
\(N_m = 4\)
\(N_m = 8\)
\(N_m = 12\)
\(N_m = 16\)
\(N_m = 20\)
\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{p}}_t\)
Simulations: scenario (iii) localized white noise (\(15\) dB SNR) ADA
\(N_m = 4\)
\(N_m = 8\)
\(N_m = 12\)
\(N_m = 16\)
\(N_m = 20\)
\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{p}}_t\)
Measurement results
Scenario A
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
ADELFI \(l_1\)
ADELFI \(\Sigma l_2\)
SBL
ADA
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)
Scenario B
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
ADELFI \(l_1\)
ADELFI \(\Sigma l_2\)
SBL
ADA
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)
Scenario C
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
ADELFI \(l_1\)
ADELFI \(\Sigma l_2\)
SBL
ADA
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)
Scenario D
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
ADELFI \(l_1\)
ADELFI \(\Sigma l_2\)
ADA
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)
Scenario E
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
ADELFI \(l_1\)
ADELFI \(\Sigma l_2\)
ADA
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)
Acknowledgments
Simulation results utilize the first 5 seconds of track 5 of Bang & Olufsen CD “Music for Archimedes”. All rights of this anechoic sample belong to B&O. The authors would like to thank Søren Bech and B&O for allowing reproduction here.
This research work was carried out
at the ESAT Laboratory of KU Leuven,
the frame of the FP7-PEOPLE
Marie Curie Initial Training Network
“Dereverberation and Reverberation
of Audio, Music, and Speech (DREAMS)”,
funded by the European Commission
under Grant Agreement no. 316969,
KU Leuven Impulsfonds IMP/14/037,
KU Leuven Internal Funds VES/16/032,
KU Leuven C2-16-00449
“Distributed Digital Signal Processing
for Ad-hoc Wireless Local Area Audio Networking”.
The research leading to these results has received funding from the European Research Council under the European Union’s Horizon 2020 research and innovation program / ERC Consolidator Grant: SONORA (no. 773268). This paper reflects only the authors’ views and the Union is not liable for any use that may be made of the contained information.