SAIL LABS Technology Unveils New German Language Feature

SAIL LABS Technology Unveils New German Language Feature

SAIL LABS Technology, a leading provider of speech technology solutions, today announced the immediate availability of the new German language feature for its award-winning Media Mining System. This highly accurate automatic speech transcription system takes the audio input from various sources and creates an indexed, summarised and searchable output in real-time.

This latest German language feature presents improvements in many dimensions. The vocabulary has been substantially extended to over 150.000 words, representing an increase of over 50% over the previous version. New and up-to-date people and organisation names were added to the vocabulary to provide for excellent coverage of today’s news. Together with an improved algorithm to handle compound-words – a phenomenon particularly important for the German language – this very large vocabulary greatly improves language coverage.

Special effort was spent on harmonisation of spellings with regard to the different stages of the German orthography.

In technical terms the language model is comprised of information gathered over more than 20 years and based on close to 1 Billion words of text. These unprecedented numbers underline the data collection and R&D investments SAIL LABS makes in its quest for excellence.

“In addition to all the above innovations, the existing German Language Model Tool (LMT) data was extended in parallel to the German base feature and improvements to pronunciations (e.g. for names of persons) were made . Thus, not only the new German language-feature itself, but also all models built on top of it, using the LMT, will benefit from these enhancements”, says D.I. Gerhard Backfried, Head of Research.

Other available speaker independent language features from SAIL LABS include:

  • Arabic
  • English UK
  • English US
  • French
  • Greek
  • Norwegian
  • Polish
  • Russian
  • Spanish

Each language is available with a Language Model Toolkit (LMT) enabling software developers and users to fine-tune the system’s language capabilities to address issues such as dialect pronunciations, and to add new vocabulary. LMT workshops are held at SAIL LABS and around the world on request.

Share on Social Media

Share on linkedin
Share on twitter
Share on google