nantonel.github.io

Joint acoustic localization and dereverberation through plane wave decomposition and sparse regularization

Abstract

Acoustic source localization and dereverberation are formulated jointly as an inverse problem. The inverse problem consists of the approximation of the sound field measured by a set of microphones. The recorded sound pressure is matched with that of a particular acoustic model based on a collection of plane waves arriving from different directions at the microphone positions. In order to achieve meaningful results, spatial and spatio-spectral sparsity can be promoted in the weight signals controlling the plane waves. The large-scale optimization problem resulting from the inverse problem formulation is solved using a first order optimization algorithm combined with a weighted overlap-add procedure. It is shown that once the weight signals capable of effectively approximating the sound field are obtained, they can be readily used to localize a moving sound source in terms of direction of arrival (DOA) and to perform dereverberation in a highly reverberant environment. Results from simulation experiments and from real measurements show that the proposed algorithm is robust against both localized and diffuse noise exhibiting a noise reduction in the dereverberated signals.

Index

Simulation results:

	sensor noise	diffuse babble noise	localized white noise
ADELFI \(l_1\)	🔗	🔗	🔗
ADELFI \(\Sigma l_2\)	🔗	🔗	🔗
SBL (SS)	🔗	🔗	🔗
ADA	🔗	🔗	🔗

Measurement results:

A	B	C	D	E
🔗	🔗	🔗	🔗	🔗

Scenarios (D-E) involve moving sound source.

Legend:

\(\mathbf{s}_t\) : (semi)-anechoic signal
\(\tilde{\mathbf{p}}_t\) : microphone signal
\(\bar{\mathbf{w}}_t\) : dereverberated signal (single weight signal)
\(\bar{\mathbf{p}}_t\) : dereverberated signal (using multiple weight signal)

Simulation results

Simulations: scenario (i) sensor noise (\(40\) dB SNR) ADELFI \(l_1\)

	\(N_m = 4\)	\(N_m = 8\)	\(N_m = 12\)	\(N_m = 16\)	\(N_m = 20\)	\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)

Simulations: scenario (i) sensor noise (\(40\) dB SNR) ADELFI \(\Sigma l_2\)

	\(N_m = 4\)	\(N_m = 8\)	\(N_m = 12\)	\(N_m = 16\)	\(N_m = 20\)	\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)

Simulations: scenario (i) sensor noise (\(40\) dB SNR) SBL

	\(N_m = 4\)	\(N_m = 8\)	\(N_m = 12\)	\(N_m = 16\)	\(N_m = 20\)	\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{p}}_t\)

Simulations: scenario (i) sensor noise (\(40\) dB SNR) ADA

	\(N_m = 4\)	\(N_m = 8\)	\(N_m = 12\)	\(N_m = 16\)	\(N_m = 20\)	\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{p}}_t\)

Simulations: scenario (ii) diffuse babble noise (\(10\) dB SNR) ADELFI \(l_1\)

	\(N_m = 4\)	\(N_m = 8\)	\(N_m = 12\)	\(N_m = 16\)	\(N_m = 20\)	\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)

Simulations: scenario (ii) diffuse babble noise (\(10\) dB SNR) ADELFI \(\Sigma l_2\)

	\(N_m = 4\)	\(N_m = 8\)	\(N_m = 12\)	\(N_m = 16\)	\(N_m = 20\)	\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)

Simulations: scenario (ii) diffuse babble noise (\(10\) dB SNR) SBL

	\(N_m = 4\)	\(N_m = 8\)	\(N_m = 12\)	\(N_m = 16\)	\(N_m = 20\)	\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{p}}_t\)

Simulations: scenario (ii) diffuse babble noise (\(10\) dB SNR) ADA

	\(N_m = 4\)	\(N_m = 8\)	\(N_m = 12\)	\(N_m = 16\)	\(N_m = 20\)	\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{p}}_t\)

Simulations: scenario (iii) localized white noise (\(15\) dB SNR) ADELFI \(l_1\)

	\(N_m = 4\)	\(N_m = 8\)	\(N_m = 12\)	\(N_m = 16\)	\(N_m = 20\)	\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)

Simulations: scenario (iii) localized white noise (\(15\) dB SNR) ADELFI \(\Sigma l_2\)

	\(N_m = 4\)	\(N_m = 8\)	\(N_m = 12\)	\(N_m = 16\)	\(N_m = 20\)	\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)

Simulations: scenario (iii) localized white noise (\(15\) dB SNR) SBL

	\(N_m = 4\)	\(N_m = 8\)	\(N_m = 12\)	\(N_m = 16\)	\(N_m = 20\)	\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{p}}_t\)

Simulations: scenario (iii) localized white noise (\(15\) dB SNR) ADA

	\(N_m = 4\)	\(N_m = 8\)	\(N_m = 12\)	\(N_m = 16\)	\(N_m = 20\)	\(N_m = 24\)
\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)
\(\bar{\mathbf{p}}_t\)

Measurement results

Scenario A

\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)

	ADELFI \(l_1\)	ADELFI \(\Sigma l_2\)	SBL	ADA
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)

Scenario B

\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)

	ADELFI \(l_1\)	ADELFI \(\Sigma l_2\)	SBL	ADA
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)

Scenario C

\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)

	ADELFI \(l_1\)	ADELFI \(\Sigma l_2\)	SBL	ADA
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)

Scenario D

\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)

	ADELFI \(l_1\)	ADELFI \(\Sigma l_2\)	ADA
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)

Scenario E

\(\mathbf{s}_t\)
\(\tilde{\mathbf{p}}_t\)

	ADELFI \(l_1\)	ADELFI \(\Sigma l_2\)	ADA
\(\bar{\mathbf{w}}_t\)
\(\bar{\mathbf{p}}_t\)

Acknowledgments

Simulation results utilize the first 5 seconds of track 5 of Bang & Olufsen CD “Music for Archimedes”. All rights of this anechoic sample belong to B&O. The authors would like to thank Søren Bech and B&O for allowing reproduction here.

Measurement results were obtained using the LOCATA challenge database and the CSTR VCTK database.

This research work was carried out at the ESAT Laboratory of KU Leuven, the frame of the FP7-PEOPLE Marie Curie Initial Training Network “Dereverberation and Reverberation of Audio, Music, and Speech (DREAMS)”, funded by the European Commission under Grant Agreement no. 316969, KU Leuven Impulsfonds IMP/14/037, KU Leuven Internal Funds VES/16/032, KU Leuven C2-16-00449 “Distributed Digital Signal Processing for Ad-hoc Wireless Local Area Audio Networking”. The research leading to these results has received funding from the European Research Council under the European Union’s Horizon 2020 research and innovation program / ERC Consolidator Grant: SONORA (no. 773268). This paper reflects only the authors’ views and the Union is not liable for any use that may be made of the contained information.