Be a part of us in Atlanta on April tenth and discover the panorama of safety workforce. We’ll discover the imaginative and prescient, advantages, and use circumstances of AI for safety groups. Request an invitation right here.
AI-as-a-service supplier Meeting AI has a brand new speech recognition mannequin referred to as Common-1. Skilled on greater than 12.5 million hours of multilingual audio knowledge, the corporate says it does effectively with speech-to-text accuracy throughout English, Spanish, French and German. It boasts that Common-1 can scale back hallucinations by 30% on speech knowledge and by 90% on ambient noise in comparison with OpenAI’s Whisper Giant-v3 mannequin.
In a weblog put up, the corporate describes Common-1 as “one other milestone in our mission to offer correct, devoted and sturdy speech-to-text capabilities for a number of languages, serving to our prospects and builders worldwide construct numerous Speech AI functions.” Together with a greater understanding of 4 main languages, the mannequin can code-switch, transcribing a number of languages inside a single audio file.
Common-1 additionally helps improved timestamp estimation, which is essential when working with audio and video modifying and dialog analytics. Meeting AI claims the brand new mannequin is 13 % higher than its predecessor, Conformer-2. In consequence, there’s higher speaker diarization, improved concatenated minimum-permutation phrase error fee (cpWER) of 14%, and speaker rely estimation accuracy by 71%.
Lastly, parallel inference has been made extra environment friendly, lowering the turnaround processing time for lengthy audio information. Common-1 is claimed to perform this activity 5 occasions sooner than Whisper Giant-v3. Meeting AI in contrast Common-1’s processing pace with Whisper Giant-3 on Nvidia Tesla T4 machines with 16GB of VRAM. With a batch dimension of 64, the previous took 21 seconds to transcribe 1 hour of audio. Nevertheless, utilizing a a lot smaller batch dimension of 24, the latter took 107 seconds to perform the identical activity.
VB Occasion
The AI Influence Tour – Atlanta
Request an invitation
The advantages of getting improved speech-to-text AI fashions are that notetakers can generate extra correct and hallucination-free notes, determine motion objects and kind out metadata corresponding to correct nouns, who’s talking and timing info. Moreover, it’ll assist creator instrument functions incorporating AI-powered video modifying workflows, telehealth platforms automated scientific notice entry and claims submission processes the place accuracy is essential, and extra.
The Common-1 mannequin is accessible by Meeting AI’s API.