Quantization Screencast

TinyML Book Screencast #4 – Quantization

For the past few months I’ve been working with Zain Asgar and Keyi Zhang on EE292D, Machine Learning on Embedded Systems, at Stanford. We’re hoping to open source all the materials after the course is done, but I’ve been including some of the lectures I’m leading as part of my TinyML YouTube series. Since I’ve talked a lot about quantization on this blog over the years, I thought it would be worth including the latest episode here too.

It’s over an hour long, mostly because quantization is still evolving so fast, and I’ve included a couple of Colabs you can play with to back up the lesson. The first lets you load a pretrained Inception v3 model and inspect the weights, and the second shows how you can load a TensorFlow Lite model file, modify the weights, save it out again and check the accuracy, so you can see for yourself how quantization affects the overall results.

The slides themselves are available too, and this is one area where I go into more depth on the screencast than I do in the TinyML book, since that has more of a focus on concrete exercises. I’m working with some other academics to see if we can come up with a shared syllabus around embedded ML, so if you’re trying to put together something similar for undergraduates or graduates at your college, please do get in touch. The TensorFlow team even has grants available to help with the expenses of machine learning courses, especially for traditionally overlooked students.

3 responses

  1. Related to architectures supporting other numbers of bits: Much of digital audio is done in terms of 24-bit ints, and companies making audio processor chips build for this spec. It’d be great someday to have efficient 24-bit support (for computation, not just file conversion) from within Python, and supported by a wider array of processors. I won’t hold my breath on that, but one can dream!

Leave a comment