ARM at NTAV 2018

Speech recognition results and their application to multimedia content search and retrieval, have been presented at the XVII Symposium on New Trends in Audio and Video (NTAV) in PoznaƄ. NTAV is a biennial event organized by the Polish section of Audio Engineering Society.

An example radio content transcript  with reference text and evaluation results

How to find an AV content among hundreds of thousands of recordings whose total duration exceeds several months or even years?

To do that one needs a text description of each content in the set and a full-text search mechanism. The description can be generated through sound and image analysis whose objective is to detected and recognize speech in audio stream, and detect and recognize text in video stream.

Despite possible inaccuracies in both elements, such a description can be used to find a specific content or information on a given subject with high probability. Speech recognition by ARM engine produces not only the best hypothesis but also a set of alternatives that additionally increase the probability of finding the desired content.

An example of text retrieved from image

news