Audiovisual active speaker localization and enhancement for multirotor micro aerial vehicles

Salvati, Daniele; Drioli, Carlo; Gulli, Andrea; Foresti, Gian Luca; Fontana, Federico; Ferrin, Giovanni

doi:10.18154/rwth-conv-239948

We address the problem of localizing a speaker and enhancing his voice using audiovisual sensors installed on a multirotor micro aerial vehicle (MAV). Acoustic-only localization and signal enhancement through beamforming techniques is especially challenging in this conditions, due to the nature and intensity of disturbances originated by the electrical engines and the propellers. We propose a solution in which an efficient beamforming-based algorithm for both localization and enhancement of the source is paired to a video-based human face detection. The video processing front-end detects the human silhouettes and provides an estimation of direction of arrivals (DOAs) on the array. When the acoustic localization front-end detects a speech activity originating from one of the possible directions estimated by the visual components, the acoustic source localization is refined and the recorded signal is enhanced through acoustic beamforming. The proposed algorithm was tested on a MAV equipped with a compact uniform linear array (ULA) of four microphones. A set of scenes featuring two human subjects lying in the field of view and speaking one at a time is analyzed through this method. The experimental results conducted in stable hovering conditions are illustrated, and the localization and signal enhancing performances are analyzed.