I miss having a dog, and I’d love to have a robot substitute! My friend Lukas built a $100 Raspberry Pi robot using TensorFlow to wander the house and recognize objects, and with the person detection model it can even follow me around. I want to be able to talk to my robot though, and at least have it understand simple words. To do that, I need to write a simple speech recognition example for TensorFlow.
As I looked into it, one of the biggest barriers was the lack of suitable open data sets. I need something with thousands of labelled utterances of a small set of words, from a lot of different speakers. TIDIGITS is a pretty good start, but it’s a bit small, a bit too clean, and more importantly you have to pay to download it, so it’s not great for an open source tutorial. I like https://github.com/Jakobovski/free-spoken-digit-dataset, but it’s still small and only includes digits. LibriSpeech is large enough, but isn’t broken down into individual words, just sentences.
To solve this, I need your help! I’ve put together a website at
https://open-speech-commands.appspot.com/ (now at https://aiyprojects.withgoogle.com/open_speech_recording) that asks you to speak about 100 words into the microphone, records the results, and then lets you submit the clips. I’m then hoping to release an open source data set out of these contributions, along with a TensorFlow example of a simple spoken word recognizer. The website itself is a little Flask app running on GCE, and the source code is up on github. I know it doesn’t work on iOS unfortunately, but it should work on Android devices, and any desktop machine with a microphone.
I’m hoping to get as large a variety of accents and devices as possible, since that will help the recognizer work for as many people as possible, so please do take five minutes to record your contributions if you get a chance, and share with anyone else who might be able to help!