The problem of recovering speech from audio recordings captured by a microphone aboard an unmanned aerial vehicle during flight is investigated. Enhancing a recording in this condition is difficult due to non-stationary noise from the motors and the propellers, along with environmental disturbance and motion-induced air flows. Together, these sources dramatically decrease the signal-to-noise ratio (SNR). This paper investigates the integration of rotor speed time series as a structured conditioning signal into neural speech enhancement models. We implement and evaluate rotor-informed variants of three state-of-the-art architectures: Wave-U-Net (time domain), DCCRN, and DCUNet (both time-frequency domain). Experiments on a custom UAV acoustics dataset spanning SNR levels from − 30 to 0 dB show that rotor conditioning yields consistent and statistically significant improvements across SNR, SI-SDR, STOI, and PESQ metrics. These benefits generalize across model families, and a lightweight rotor-informed variant achieves best or near-best results despite using only 25% of the parameters. The findings establish rotor-informed conditioning as a robust and generalizable strategy for speech enhancement in low-SNR UAV environments.

Enhancing drone audition with rotor-conditioned deep models

Fontana, Federico;Drioli, Carlo;Salvati, Daniele;Ferrin, Giovanni
2025-01-01

Abstract

The problem of recovering speech from audio recordings captured by a microphone aboard an unmanned aerial vehicle during flight is investigated. Enhancing a recording in this condition is difficult due to non-stationary noise from the motors and the propellers, along with environmental disturbance and motion-induced air flows. Together, these sources dramatically decrease the signal-to-noise ratio (SNR). This paper investigates the integration of rotor speed time series as a structured conditioning signal into neural speech enhancement models. We implement and evaluate rotor-informed variants of three state-of-the-art architectures: Wave-U-Net (time domain), DCCRN, and DCUNet (both time-frequency domain). Experiments on a custom UAV acoustics dataset spanning SNR levels from − 30 to 0 dB show that rotor conditioning yields consistent and statistically significant improvements across SNR, SI-SDR, STOI, and PESQ metrics. These benefits generalize across model families, and a lightweight rotor-informed variant achieves best or near-best results despite using only 25% of the parameters. The findings establish rotor-informed conditioning as a robust and generalizable strategy for speech enhancement in low-SNR UAV environments.
File in questo prodotto:
File Dimensione Formato  
Gulli_et_al-2025-EURASIP_Journal_on_Audio_Speech_and_Music_Processing.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 6.43 MB
Formato Adobe PDF
6.43 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1316824
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact