Importance of binaural cues of depth in low-resolution audio-visual 3D scene reproductions

Salvati, Daniele; Drioli, Carlo; Fontana, Federico; Foresti, Gian Luca

doi:10.1109/SIVE.2018.8577121

In spite of their clear audibility, auditory depth cues have been shown to add generally imprecise information to a 3D scene description. We hypothesize that, conversely, this information becomes salient when a scene is reproduced with low visual resolution. For this purpose, a system has been realized by assembling inexpensive audio-visual reproduction technologies together. The system forms a 3D visual scene from two screen images that are polarized orthogonally before reaching the observer, who wears polarized glasses. In parallel, two small loudspeakers are arranged in stereo dipole configuration to create a binaural hot-spot using a cross-talk cancellation solution. Sounds and images are recorded from a real scene using a stereo camera and a pair of microphones, mounted together to capture average anthropometric inter-eye and inter-aural distances. Based on this system, we have measured that the use of binaural instead of monophonic feedback significantly improves the precision of participants who were asked to guess the time-to-passage of a ball rolling down toward them along a rectilinear trajectory. Preliminary results suggest that the binaural rolling sounds coming from the ball approaching the listener were proficiently employed by participants to improve their guess.