Automatic Mixing for Immersive Teleconferencing Systems


The aim of immersive teleconferencing is to convey a realistic sound field impression to a remote participant. To this end, the spatial distribution of talkers as well as room information needs to be captured by the near-end system and accurately reproduced on the far-end. We consider a setup where high speech quality is obtained by means of several close microphones and spatial information is captured with a small microphone array in the centre of the acoustic scene. The proposed automatic mixing system robustly estimates the directions of multiple active talkers and mixes the close-microphone signals with the room information gathered by the central microphone array. Furthermore, we propose a novel automatic gain control method that keeps natural speech dynamics while equalizing speech level fluctuations due to unintentional changes of the talker-microphone distance. To allow for maximal flexibility concerning the reproduction system on the far-end (e.g. different loudspeaker setups or binaural reproduction for headphones), the sound field is encoded in higher-order Ambisonics. Listening experiments of our concluding evaluation indicate the optimal settings for recorded multi-talker scenarios using both headphone- and loudspeaker-based reproduction.

Annual German Conference on Acoustics DAGA
