Detection of acoustic events and direction of arrival estimation of acoustic sources are nowadays central topics in the field of acoustic array signal processing, providing both theoretical and practical relevant perspectives. Reconnaissance and surveillance against intrusions, search and rescue in hostile environments, speaker detection and localization are examples of real applications in which an accurate and efficient analysis of the acoustic scene is required. In this context, we have been investigating an efficient CNN neural network-based method capable of learning from the multi-channel signal an intrinsic function between a specific choice of audio-related features and both the nature of the acoustic event and its spatial location. In this work, we investigate an extended CNN network and compare its performance with the accuracy of a reduced complexity CNN network. The main novelty introduced in this research with respect to the state-of-the-art is that we propose a Diagonal Unloading (DU) Beamforming-based method that produces acoustic maps of azimuth and elevation angles to generate the feature representation of the acoustic signal. A comparative study with the Log-Mel Spectrogram feature representation is also conduced along with a method with fusion of the two feature representations that has been experimented in this work. The dataset for both training and validation of the CNN network belongs to the DCASE challenge. The experiments demonstrated the benefits introduced by the DU-Acoustic Map feature representation that provides additional information about the position of acoustic sources, in terms of angles of azimuth and elevation, through the acoustic maps. The accuracy and efficiency of the proposed deep learning-based method are confirmed by the results.
Efficient Detection and Localization of Acoustic Sources with a low complexity CNN network and the Diagonal Unloading Beamforming
Toma A.;Salvati D.;Drioli C.;Foresti G. L.
2022-01-01
Abstract
Detection of acoustic events and direction of arrival estimation of acoustic sources are nowadays central topics in the field of acoustic array signal processing, providing both theoretical and practical relevant perspectives. Reconnaissance and surveillance against intrusions, search and rescue in hostile environments, speaker detection and localization are examples of real applications in which an accurate and efficient analysis of the acoustic scene is required. In this context, we have been investigating an efficient CNN neural network-based method capable of learning from the multi-channel signal an intrinsic function between a specific choice of audio-related features and both the nature of the acoustic event and its spatial location. In this work, we investigate an extended CNN network and compare its performance with the accuracy of a reduced complexity CNN network. The main novelty introduced in this research with respect to the state-of-the-art is that we propose a Diagonal Unloading (DU) Beamforming-based method that produces acoustic maps of azimuth and elevation angles to generate the feature representation of the acoustic signal. A comparative study with the Log-Mel Spectrogram feature representation is also conduced along with a method with fusion of the two feature representations that has been experimented in this work. The dataset for both training and validation of the CNN network belongs to the DCASE challenge. The experiments demonstrated the benefits introduced by the DU-Acoustic Map feature representation that provides additional information about the position of acoustic sources, in terms of angles of azimuth and elevation, through the acoustic maps. The accuracy and efficiency of the proposed deep learning-based method are confirmed by the results.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.