I'm building something that involves some speech-to-text transcription; so far I've implemented it with Google Cloud, but I'm pretty underwhelmed with the accuracy (50% or so from a Zoom call).
Anyone have experience with other offerings and some suggestions about what to try next?
I did a project that transcribed voicemails using AWS Transcribe. The results were far from stellar so I made the transcripts editable to allow users to fix transcription errors.
How was the speaker diarization?
What was your source audio like? And what would you estimate accuracy % to be?
Thank you!
It was only a single speaker in each voicemail. The input was a voicemail so the quality was phone call quality. Accuracy was pry ~60%.
I don't think the tech is there yet so you're either gonna need to pay for human-assisted transcriptions or add some kind of self-editing interface. Or just settle for 60% accuracy.
Right on, thanks again!