Automatic speech recognition#

The XVF3800 allows outputting audio enhanced for Automatic Speech Recognition (ASR). The ASR output provided by the XVF3800 is not optimized for voice communication, but it is meant to be used with wake word detection engines. The ASR audio is extracted after the beamformer, and it is not fed into the post processor. This means it has no noise suppression, which is desirable as ASR performance is usually degraded by non-linear processing. An optional fixed gain can be applied to match the requirements of different wake word engines.

The ASR output is not enabled by default. The XVF3800 can send the ASR audio to both the left and right channel. To use the right channel issue the commands:

(sudo) xvf_host(.exe) AUDIO_MGR_OP_R 7 3
(sudo) xvf_host(.exe) AEC_ASROUTONOFF 1

The commands above map the output of autoselect beam in the right channel, see Output Selection for more details. The command AEC_ASROUTONOFF changes the output audio from the AEC residual of the fourth microphone to the output of autoselect beam. The left channel is used by default for the voice communication, but it can be configured as ASR output, by replacing the command AUDIO_MGR_OP_R above with AUDIO_MGR_OP_L.

The configurable fixed gain can be modified by using the command AEC_ASROUTGAIN. The desired output level will vary for each ASR engine. One recommendation is an output level of -52 dBov for a 61 dBSPL level at the device. Typically this will result in an AEC_ASROUTGAIN around 36 dB lower than PP_AGCGAIN, depending on the PP_AGCDESIREDLEVEL.