When I launched the Speech Commands dataset last year I wasn’t quite sure what to expect, but I’ve been very happy to see all the creative ways people have used it, like guiding embedded optimizations or testing new model architectures. The best part has been all the conversations I’ve ended up having because of it, and how much I’ve learned about the area of microcontroller machine learning from other people in the field.
Having a lot of eyes on the data (especially through the Kaggle competition) gave me a lot more insight into how to improve its quality, and there’s been a steady stream of volunteers donating their voices to expand the number of utterances. I also had a lot of requests for a paper giving more details on the dataset, especially covering how it was collected and what the best approaches to benchmarking accuracy were. With all of that in mind, I spent the past few weeks gathering the voice data that had been donated recently, improving the labeling process, and documenting it all in much more depth. I’m pleased to say that the resulting paper is now up on Arxiv, and you can download the expanded and improved archive of over one hundred thousand utterances. The folder layout is still compatible with the first version, so to run the example training script from the tutorial, you can just execute:
python tensorflow/examples/speech_commands/train.py \ --data_url=http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz
I’m looking forward to hearing more about how you’re using the dataset, and continuing the conversations it has already sparked, so I hope you have as much fun with it as I have!