During the pandemic travel lockdown I’ve ended up accumulating a lot of vacation time, so I decided to take a lot of December off. I did spend some time relaxing, especially walking our adorable new dogs, but there were some coding itches I wanted to scratch. One of the biggest was building a simple system for prototyping voice interfaces on an embedded device like a Raspberry Pi, all running locally. I’ve been following the Coqui.ai team’s work since they launched, and was very impressed by the quality of the open source speech models and code they have produced. I didn’t have an easy way to run them myself though, especially on live microphone input. With that in mind, I decided my holiday project would be writing a command line tool using Coqui’s speech to text library. To keep it as straightforward as possible I modeled it on the classic Unix
cat command, where the default would be to read audio from a microphone and output text (though it ended up expanding to system audio and files too) so I called it
spchcat. You can now download it yourself for Pi’s and x86 Linux from speechcat.org!
As usual, the scope kept expanding beyond my original idea. Coqui have collaborated with groups like ITML to collect models for over 40 languages, including some that are endangered, so I couldn’t resist supporting those, even though it makes the installer over a gigabyte in size. I also found it straightforward to support x86 Linux, since Coqui supply prebuilt libraries for those platforms too.
I’ve now scratched my own itch, but I’m hoping that this code will help introduce more people to the amazing advances in open source voice technology that have been happening over the last few years, and also help increase the number of people donating their voices to Common Voice, since none of this could have happened without Mozilla’s groundbreaking efforts. There’s still a lot of room for improvement with the accuracy and language coverage, but I’m confident that this is a project the open source community can make rapid progress on.
Thanks to the Coqui team for their great contributions, and to everyone who helped me test this initial release, especially Keyi for his detailed bug reports. I’m hoping to see some fun projects emerge out of this, so please drop me a line at firstname.lastname@example.org or leave a comment if you do have something you’d like to share!
Pingback: Pete Warden: Launching spchcat, an open-source speech recognition tool for Linux and Raspberry Pi | ResearchBuzz: Firehose
Thank you, this is the first non-Google software that is able to recognize Czech spoken words, and it does it quickly and accurately. I’m very amazed! Going to build something cool using that.
Pingback: Try OpenAI’s Amazing Whisper Speech Recognition in a Free Web App « Pete Warden's blog