Ann and Pete Talk AI

I’ve never been much of a podcast or video creator but recently I’ve started posting a series of short chats with my friend Ann Spencer on YouTube and it’s been a lot of fun. I realized I hadn’t mentioned it here, so as they say, please like and subscribe. I’ve also embedded one of my favorite episodes above, which is mostly me ranting about why privacy policies are worse than useless! I might turn that into a post here too, if I can make some time.

Join me at the Tesla Protests on Saturday

I’ve been writing this blog for nineteen years, and in over 1,100 posts I’ve never once brought up politics, but I can’t ignore what’s happening in our country. We’re facing such a profound crisis right now in the US that not speaking up at this point would be breaking the oath I took in 2014, when I became a proud citizen, to “defend the constitution” … “against all enemies, foreign and domestic“. I won’t repeat all the ways that the executive branch is destroying fundamental rights like habeas corpus and the rule of law. If you’re happy with what’s going on, I don’t know how to even reach you, so feel free to stop reading.

If you think what’s happening is wrong, but feel helpless to do anything about it, you should join one of the nationwide protests at Tesla showrooms around the country. I have never been to a protest in the US before, and I was actually pretty scared to attend my first. I’m a naturalized citizen, and I’ve never been made to feel more of a foreigner than I have over the last few months. Even though I have incredible privileges and resources compared to the most vulnerable groups, like trans people and immigrants who haven’t finished the arduous process of becoming citizens yet, I was still nervous about standing up in public to say what I believed. I’ve now been to two Saturday protests at the Tesla dealership on Van Ness in San Francisco, and I’ve been amazed at how heartening it has been to be surrounded by other people who are appalled at what is happening, and to hear the horns of the many others who drive by and show their support.

If, like me, you haven’t been to a protest before, you might have questions. Is it safe? The crowd and organizers are extremely chill, and at least half of the protestors are senior citizens. Despite what Fox may tell its viewers, the protestors are ordinary people like you and me who care about their country, not “Radical Leftists”. There’s incredible positive energy, and there’s never been a hint of violence. The organizers are very clear that this is a peaceful protest, and there will be zero tolerance from them for any trespassing or property damage. Tesla drivers who pass get some good-natured thumbs down, but even when a couple of agitators in MAGA hats showed up filming this weekend, everyone just laughed and rolled their eyes. I’m particularly proud of my wife Joanne, who when one of them stuck a camera in her face and asked “What are you protesting?” (in a thick Russian accent, so presumably a fellow immigrant?), she smiled and replied “You“, which he didn’t have a response to. There’s also no sign up necessary, you can arrive any time between 12pm and 2pm, stay for as long as you feel like, and leave whenever you want.

If you would like to do something, this Saturday (March 29th 2025) is going to be the biggest yet. Find your local Tesla dealer, and even if you’re in a deep red state, there’s almost certainly going to be a group gathering between 12pm and 2pm.

I know not everybody has the resources or ability to attend these protests, but there are still things you can do. I write as many Blue Wave Postcards as I can find time for. They encourage people to vote, and there are important elections coming up all the time, like the judicial race in Wisconsin that may decide whether they get fair redistricting for a long time to come. If you don’t have the money to pay for the postcards and stamps required, you can use the 5 Calls app to tell your representatives how concerned you are.

It’s no longer okay to ignore what’s happening, or keep your head down to avoid offending other people. This is a deep, deep crisis, and our only chance of a way out is if we work together to make sure our voices are heard. Please, join me in doing what you can. Even if we aren’t successful, I want to be able to say I went down fighting for what I believe in. Don’t you?

Debugging Disposable ML Frameworks

Guest post by Nat Jeffries, Founding Engineer at Useful Sensors.

At Useful Sensors we love using disposable frameworks to deploy on-device transformers. Having built several such frameworks, I realized that, while there are great resources for understanding and training transformer models, there are few guides for deploying them on-device. The following are some lessons I wish I knew when I started building disposable frameworks, and some tricks I’ve learned along the way.

First, I’ve learned to make sure to test parts of the model rather than the whole thing. When you run a transcription model on some sample audio clip and get back wingdings, curse words or nothing at all, it’s hard to know what went wrong. I like to compare intermediate tensor values from a known-good model against the same tensors in my custom framework, working from the input through each major block until these tensors differ. One trick I’ve found is to log the sum and shape of each tensor rather than all or some of the tensor values. 

Here’s an example in C++:

void print_tensor(const Tensor* tensor, std::string msg) {
  float sum = 0;
  for (auto elem : tensor->data) {
    sum += elem;
  }
  printf("%s: sum: %.4f shape (", msg.c_str(), sum);
  for (auto elem : tensor->shape()) {
    printf("%d ", elem);
  } printf(")\n");
}

Tensor* generate(Tensor* input, Tensor* mask, Tensor* seq) {
  print_tensor(input, "input");
  print_tensor(mask, "mask");
  auto* preprocessed = preprocess(input);
  print_tensor(preprocessed, "preprocessed");
  auto* embedding = encoder(input, mask);
  print_tensor(embedding, "embedding");
  auto* output = decoder(seq, embedding, mask);
  print_tensor(output, "output");
  return output;
}

And here’s the Python version:

def print_tensor(tensor, name):
    print(f'{name} sum {torch.sum(tensor)} shape {tensor.shape}')

def generate(src, mask, seq):
    print_tensor(src, "input")
    print_tensor(mask, "input mask")

    preprocessed = preprocessor(src)
    print_tensor(preprocessed, "preprocessed")

    enc = encoder(src=preprocessed, input_mask=mask)
    print_tensor(enc, "embedding")

    output = decoder(prompt=seq, embedding=enc, input_mask=mask)
    print_tensor(output, "output")

It’s rare that two tensors with the same sum and shape contain different values, and even if they do then the error will almost always appear one block later. Remember that this includes checking the input of the two models. I’ve lost count of the number of times I used an incorrectly quantized input, the wrong input mask, or fed inputs into the model in the wrong order.

When dealing with quantized tensors, always refer back to the floating point values represented by the quantized tensors. Remember that regardless of the quantization scheme, each quantized value is an approximation of an equivalent floating point value in the known-good (usually floating point) model. Recording sums and shapes of quantized tensors converted back to float can be a good way to ensure that the models match, and to quickly identify integer overflow, incorrect logic, or excessive quantization error.

Finally, make sure to periodically take a step back and honestly evaluate how clear your mental picture of what you’re trying to implement is. I recently experienced this while adding batch decoding to our Moonshine model. I spent many days debugging subtle differences between batch and non-batch versions of our model before realizing that I had forgotten to mask cross attention in the decoder. A simple gap in my knowledge, quickly solved by reading a guide on masking in encoder-decoder models, resulted in days of wasted effort.
Hopefully these tricks can save somebody from the pitfalls I’ve fallen into. If you’re interested in deploying speech models on-device or have tips I missed here, please reach out!