Speech Commands is now larger and cleaner!

waveform.png

Picture by Aaron Parecki

When I launched the Speech Commands dataset last year I wasn’t quite sure what to expect, but I’ve been very happy to see all the creative ways people have used it, like guiding embedded optimizations or testing new model architectures. The best part has been all the conversations I’ve ended up having because of it, and how much I’ve learned about the area of microcontroller machine learning from other people in the field.

Having a lot of eyes on the data (especially through the Kaggle competition) gave me a lot more insight into how to improve its quality, and there’s been a steady stream of volunteers donating their voices to expand the number of utterances. I also had a lot of requests for a paper giving more details on the dataset, especially covering how it was collected and what the best approaches to benchmarking accuracy were. With all of that in mind, I spent the past few weeks gathering the voice data that had been donated recently, improving the labeling process, and documenting it all in much more depth. I’m pleased to say that the resulting paper is now up on Arxiv, and you can download the expanded and improved archive of over one hundred thousand utterances. The folder layout is still compatible with the first version, so to run the example training script from the tutorial, you can just execute:

python tensorflow/examples/speech_commands/train.py \
--data_url=http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz

I’m looking forward to hearing more about how you’re using the dataset, and continuing the conversations it has already sparked, so I hope you have as much fun with it as I have!

One response

  1. Hi Pete, I would like to know if this method can be used for optical spectral analysis. As I work in the field of photonics, I am looking to learn ML through spectral data analysis. Any leads on that? Thanks.

    Yours,
    Chinna.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: