The problem of recovering speech from audio recordings captured by a microphone aboard an unmanned aerial vehicle during flight is investigated. Enhancing a recording in this condition is difficult due to non-stationary noise from the motors and the propellers, along with environmental disturbance and motion-induced air flows. Together, these sources dramatically decrease the signal-to-noise ratio (SNR). This paper investigates the integration of rotor speed time series as a structured conditioning signal into neural speech enhancement models. We implement and evaluate rotor-informed variants of three state-of-the-art architectures: Wave-U-Net (time domain), DCCRN, and DCUNet (both time-frequency domain). Experiments on a custom UAV acoustics dataset spanning SNR levels from − 30 to 0 dB show that rotor conditioning yields consistent and statistically significant improvements across SNR, SI-SDR, STOI, and PESQ metrics. These benefits generalize across model families, and a lightweight rotor-informed variant achieves best or near-best results despite using only 25% of the parameters. The findings establish rotor-informed conditioning as a robust and generalizable strategy for speech enhancement in low-SNR UAV environments.
Enhancing drone audition with rotor-conditioned deep models
Fontana, Federico;Drioli, Carlo;Salvati, Daniele;Ferrin, Giovanni
2025-01-01
Abstract
The problem of recovering speech from audio recordings captured by a microphone aboard an unmanned aerial vehicle during flight is investigated. Enhancing a recording in this condition is difficult due to non-stationary noise from the motors and the propellers, along with environmental disturbance and motion-induced air flows. Together, these sources dramatically decrease the signal-to-noise ratio (SNR). This paper investigates the integration of rotor speed time series as a structured conditioning signal into neural speech enhancement models. We implement and evaluate rotor-informed variants of three state-of-the-art architectures: Wave-U-Net (time domain), DCCRN, and DCUNet (both time-frequency domain). Experiments on a custom UAV acoustics dataset spanning SNR levels from − 30 to 0 dB show that rotor conditioning yields consistent and statistically significant improvements across SNR, SI-SDR, STOI, and PESQ metrics. These benefits generalize across model families, and a lightweight rotor-informed variant achieves best or near-best results despite using only 25% of the parameters. The findings establish rotor-informed conditioning as a robust and generalizable strategy for speech enhancement in low-SNR UAV environments.| File | Dimensione | Formato | |
|---|---|---|---|
|
Gulli_et_al-2025-EURASIP_Journal_on_Audio_Speech_and_Music_Processing.pdf
accesso aperto
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
6.43 MB
Formato
Adobe PDF
|
6.43 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


