In collaboration with Neohelden, now part of msg solutions, AMAI developed a robust solution to improve automatic speech recognition (ASR) for use in industrial environments.
challenge
Although voice assistants are already being used for home use (Alexa, Siri, Cortana, etc.), they are not yet more widely used in industry. Because here in particular, the reliability requirements are very high. Among other things, the system must understand domain-specific terms and descriptions and translate them into desired actions. Background noise, other voices and different volume levels make it even more difficult to recognize speech and thus increase the so-called Word Error Rate (WER).
approach
The solution includes the development of a domain-specific, German text corpus to create the basis for training a precise speech recognition model. For this, we rely on the Kaldi ASR framework (kaldi-asr.org), an established open-source tool known for its flexibility and efficiency in natural language processing. In order to strengthen the robustness of the model compared to real operating conditions, we implement data enrichment, for example by superimposing voice recordings with various background noises. This approach simulates authentic environmental conditions that can occur during the inspection and maintenance of machines and systems. The resulting voice recognition system is specifically designed to efficiently support employees in their daily tasks by enabling reliable recognition and transcription of voice commands and notes.