We’re immersed in video and audio calling. It’s long been a key part of how we connect at work but with Covid-19, it’s played a critical part in keeping us close to family and friends. But it’s not all positive. ‘Call fatigue’ is a real concern – the exhaustion that comes with hours spent communicating via audio and video services. The audio quality can certainly lessen or exacerbate that fatigue, and so forms an integral consideration in the design of conference-calling products. In this blog, we explore some of the audio characteristics which contribute to call quality, and how we’ve captured them in our VocalFusion® XVF3000 voice processor range.
Echo has one of the most disruptive effects on audio call quality; it can range from merely distracting, to rendering conversation unintelligible. Echo is suppressed using an Acoustic Echo Cancellation algorithm, which models the delay and transfer of transmitted audio and applies a correpsonding calculation that suppresses the echo.
Our XVF3000 voice processor implements a full-duplex Acoustic-Echo Canceller with double talk management, that is designed to deal with conference calling and telephony. With the Echo canceller enabled, far-end speech (the incoming speech from other callers), is suppressed from being transmitted with near-end speech signal (your outgoing speech). If left unchecked, it would create significant echo distortion.
Some of the less obvious features of the audio signal which affect audio quality are the frequency composition and clarity of voice isolation. As with all digital systems, the rate at which data is sampled dictates the bandwidth of the signal. The bandwidth of the signal is the amount of frequency information which can be represented by the signal. The XVF3000 processes the data at a 16KHz sample frequency, creating a bandwidth of 8KHz (wideband audio). This captures more speech frequency data than standard telephony devices, thereby creating a better representation of the voice signal.
Another significant driver of audio clarity is the number of microphones used in the audio capture. The XVF3000 uses 4 microphones to isolate the speaker from other noise sources in the environment via beamforming. (It can do this with a linear or circular microphone array.) Increasing the sample frequency and / or the number of microphones increases the potential quality of the audio, but this quickly becomes computationally, and physically expensive, requiring higher performance processing for smaller gains. The 16KHz sample frequency and 4 microphone beamforming technology of the XVF3000 hit the sweet spot between performance and cost for premium conferencing.
Another element to consider is transmission delay in the passing of audio from near-end to far-end. This is usually driven by network latency rather than device or processing, but minimising latency though the voice processing is an important aspect of system design because it solves for any larger network delays which may be encountered. The XVF3000 – built on our widely used USB audio stack – is designed for professional audio where low-latency is paramount. The USB stack coupled with highly efficient processing mean the XVF3000 delivers minimal end-to-end latency in both send and receive directions for conferencing systems.
In addressing these key aspects audio quality, the XVF3000 stands as a high quality choice for conferencing-calling and calling applications. The processor is also designed to provide easy integration, with flexible interfaces (USB / I2S), the ability to add custom filtering (output EQ) and system level features such as Direction-of-Arrival indication allowing usage all system architectures.
Please contact us to discuss your conferencing project and how our solutions can help.