Why We’re Building an Open-Source Universal Translator

We all grew up with TV shows, books, and movies that assume everybody can understand each when they speak, even if they’re aliens. There are various in-universe explanations for this convenient feature, and most of them involve a technological solution. Today, the Google Translate app is the closest thing we have to this kind of universal translator, but the experience isn’t good enough to be used everywhere it could be useful. I’ve often found myself bending over a phone with someone, both of us staring at the screen to see the text, and switching back and forth between email or another app to share information.

Science fiction translators are effortless. You can walk up to someone and talk normally, and they understand what you’re saying as soon as you speak. There’s no setup, no latency, it’s just like any other conversation. So how can we get there from here?

One of the most common answers is a wearable earpiece. This is in line with Hitchhikers’ Babel Fish, but there are still massive technical obstacles to fitting the processing required into such a small device, even offloading compute to a phone would require a lot of radio and battery usage. These barriers mean we’ll have to wait for hardware innovations like Syntiant’s to go through a few more generations before we can create this sort of dream device.

Instead, we’re building a small, unconnected box with a built-in display that can automatically translate between dozens of different languages. You can see it in the video above, and we’ve got working demos to share if you’re interested in trying it out. The form factor means it can be left in-place on a hotel front desk, brought to a meeting, placed in front of a TV, or anywhere you need continuous translation. The people we’ve shown this to have already asked to take them home for visiting relatives, colleagues, or themselves when traveling.

You can get audio out using a speaker or earpiece, but you’ll also see real-time closed captions of the conversation in the language of your choice. The display means it’s easy to talk naturally, with eye contact, and be aware of the whole context. Because it’s a single-purpose device, it’s less complicated to use than a phone app, and it doesn’t need a network connection, so there are no accounts or setup involved, it starts working as soon as you plug it in.

We’re not the only ones heading in this direction, but what makes us different is that we’ve removed the need for any network access, and partly because of that we’re able to run with much lower latency, using the latest AI techniques to still achieve high accuracy.

We’re also big believers in open source, so we’re building on top of work like Meta’s NLLB and OpenAI’s Whisper and will be releasing the results under an open license. I strongly believe that language translation should be commoditized, making it into a common resource that a lot of stakeholders can contribute to, so I hope this will be a step in that direction. This is especially essential for low-resource languages, where giving communities the opportunity to be involved in digital preservation is vital for their future survival. Tech companies don’t have a big profit incentive to support translation, so advancing the technology will have to rely on other groups for support.

I’m also hoping that making translation widely available will lead manufacturers to include it in devices like TVs, kiosks, ticket machines, help desks, phone support, and any other products that could benefit from wider language support. It’s not likely to replace the work of human translators (as anyone who’s read auto-translated documentation can vouch for) but I do think that AI can have a big impact here, bringing people closer together.

If this sounds interesting, please consider supporting our crowdfunding campaign. One of the challenges we’re facing is showing that there’s real demand for something like this, so even subscribing to follow the updates helps demonstrate that it’s something people want. If you have a commercial use case I’d love to hear from you too, we have a limited number of demo units we are loaning to the most compelling applications.

The Unstoppable Rise of Disposable ML Frameworks

Photo by Steve Harwood

On Friday my long-time colleague Nat asked if we should try and expand our Useful Transformers library into something that could be suitable for a lot more use cases. We worked together on TensorFlow, as did the main author of UT, Manjunath, so he was surprised when I didn’t want to head too far in a generic direction. As I was discussing it with him I realized how much my perspective on ML library design has changed since we started TensorFlow, and since I think by writing I wanted to get my thoughts down as this post.

The GGML framework is just over a year old, but it has already changed the whole landscape of machine learning. Before GGML, an engineer wanting to run an existing ML model would start with a general purpose framework like PyTorch, find a data file containing the model architecture and weights, and then figure out the right sequence of calls to load and execute it. Today it’s much more likely that they will pick a model-specific code library like whisper.cpp or llama.cpp, based on GGML.

This isn’t the whole story though, because there are also popular model-specific libraries like llama2.cpp or llama.c that don’t use GGML, so this movement clearly isn’t based on the qualities of just one framework. The best term I’ve been able to come up with to describe these libraries is “disposable”. I know that might sound derogatory, but I don’t mean it like that, I actually think it’s the key to all their virtues! They’ve limited their scope to just a few models, focus on inference or fine-tuning rather than training from scratch, and overall try to do a few things very well. They’re not designed to last forever, as models change they’re likely to be replaced by newer versions, but they’re very good at what they do.

By contrast, traditional frameworks like PyTorch or TensorFlow try to do many different things for a lot of different audiences. They are designed to be toolkits that can be reused for almost any possible model, for full training as well as deployment in production, scaling from laptops (or even in TF’s case microcontrollers) to distributed clusters of hundreds of GPUs or TPUs. The idea is that you learn the fundamentals of the API, and then you can reuse that knowledge for years in many different circumstances.

What I’ve seen firsthand with TensorFlow is how coping with such a wide range of requirements forces its code to become very complex and hard to understand. The hope is always that the implementation details can be hidden behind an interface, so that people can use the system without becoming aware of the underlying complexity. In practice this is impossible to achieve, because latency and throughput are so important. The only reason to use ML frameworks instead of a NumPy Python script is to take advantage of hardware acceleration, since training and inference time need to be minimized for many projects to be achievable. If a model takes years to train, it’s effectively untrainable. If a chatbot response takes days, why bother?

But details leak out from the abstraction layer as soon as an engineer needs to care about speed. Do all of my layers fit on a TPU? Am I using more memory than I have available on my GPU? Is there a layer in the middle of my network that’s only implemented as a CPU operation, and so is causing massive latencies as data is copied to and from the accelerator? This is where the underlying complexity of the system comes back to bite us. There are so many levels of indirection involved that building a mental model of what code is executing and where is not practical. You can’t even easily step through code in a debugger or analyze it using a profiler, because much of it executes asynchronously on an accelerator, goes through multiple compilation steps before running on a regular processor, or is dispatched to platform-specific libraries that may not even have source code available. This opaqueness makes it extremely hard for anyone outside of the core framework team to even identify performance problems, let alone propose fixes. Because every code path is used by so many different models and use cases, just verifying that any change doesn’t cause a regression is a massive job.

By contrast, debugging and profiling issues with disposable frameworks is delightfully simple. There’s a single big program that you can inspect to understand the overall flow, and then debug and profile using very standard tools. If you spot an issue, you can find and change the code easily yourself, and either keep it on your local copy or create a pull request after checking the limited number of use cases the framework supports.

Another big pain point for “big” frameworks is installation and dependency management. I was responsible for creating and maintaining the Raspberry Pi port of TensorFlow for a couple of years, and it was one of the hardest engineering jobs I’ve had in my career. It was so painful I eventually gave up, and nobody else was willing to take it on! Because TF supported so many different operations, platforms, and libraries, porting and keeping it building on non-x86 platform was a nightmare. There were constantly new layers and operations being added, many of which in turn relied on third party code that also had to be ported. I groaned when I saw a new dependency appear in the build files, usually for something like an Amazon AWS input authentication pip package that didn’t add much value for the Pi users, but still required me to figure out how to install it on a platform that was often unsupported by the authors.

The beauty of single-purpose frameworks is that they can include all of the dependencies they need, right in the source code. This makes them a dream to install, often only requiring a checkout and build, and makes porting them to different platforms much simpler.

This is not a new problem, and during my career at Google I saw a lot of domain or model-specific libraries emerge internally as alternatives to using TensorFlow. These were often enthusiastically adopted by application engineers, because they were so much easier to work with. There was often a lot of tension about this with the infrastructure team, because while this approach helped ship products, there were fears about the future maintenance cost of supporting many different libraries. For example, adding support for new accelerators like TPUs would be much harder if it had to be done for a multitude of internal libraries rather than just one, and it increased the cost of switching to new models.

Despite these valid concerns, I think disposable frameworks will only grow in importance. More people are starting to care about inference rather than training, and a handful of foundation models are beginning to dominate applications, so the value of using a framework that can handle anything but is great at nothing is shrinking.

One reason I’m so sure is that we’ve seen this movie before. I spent the first few years of my career working in games, writing rendering engines in the Playstation 1 era. The industry standard was for every team to write their own renderer for each game, maybe copying and pasting some code from other titles but otherwise with little reuse. This made sense because the performance constraints were so tight. With only two megabytes of memory on a PS1 and a slow processor, every byte and cycle counted, so spending a lot of time jettisoning anything unnecessary and hand-optimizing the functions that mattered was a good use of programming time. Every large studio had the same worries about maintaining such a large number of engines across all their games, and every few years they’d task an internal group to build a more generic renderer that could be reused by multiple titles. Inevitably these efforts failed. It was faster and more effective for engineers to write something specialized from scratch than it was to whittle down and modify a generic framework to do what they needed.

Eventually a couple of large frameworks like Unity and Unreal came to dominate the industry, but it’s still not unheard of for developers to write their own, and even getting this far took decades. ML frameworks face the same challenges as game engines in the 90’s, with application developers given tight performance and memory constraints that are hard to hit using generic tools. If the past is any guide we’ll see repeated attempts to promote unified frameworks while real-world developers rely on less-generic but simpler libraries.

Of course it’s not a totally binary choice. For example we’re still planning on expanding Useful Transformers to support the LLM and translation models we’re using for our AI in a Box so we’ll have some genericity, but the mid-2010’s vision of “One framework to rule them all” is dead. It might be that PyTorch (which has clearly won the research market) becomes more like MatLab, a place to prototype and create algorithms, which are then hand-converted to customized inference frameworks by experienced engineers rather than automated tools or compilers.

What makes me happiest is that the movement to disposable frameworks is clearly opening up the world of ML development to many more people. By removing the layers of indirection and dependencies, the underlying simplicity of machine learning becomes a lot clearer, and hopefully less intimidating. I can’t wait to see all of the amazing products this democratization of the technology produces!

Request for Sensors

At Useful Sensors we’re focused on building intelligent sensors, ones that use machine learning to take raw data and turn it into actionable insights. Sometimes I run across problems in my own life that don’t need advanced algorithms or AI to solve, but are blocked by hardware limitations. A classic one is “Did I leave my garage door open?”. A few months ago I even had to post to our street’s mailing list to ask someone to check it while I was away, since I was anxious I’d left it open. Thankfully several of my great neighbors jumped in and confirmed it was closed, but relying on their patience isn’t a long term or scalable solution.

Sensors to help with this do exist, and have for a long time, so why are they still a niche product? For me, the holdbacks are difficult setup procedures and short battery lives. The tradeoff generally seems to be that to get a battery life measured in years, you need to use a specialized protocol like ZigBee, Threads, or Matter, which requires a hub, which adds to the setup time and likelihood of having to troubleshoot issues. Wifi-enabled sensors like the Swann linked to above don’t specify a battery life (the support team refuses to give an estimate further down on the page) but I’ve found similar devices last months, not years. What I would love is a cell-data-connected sensor with zero accounts, apps, or setup, beyond maybe scanning a QR code to claim it. One of the reasons I’m a big fan of Blues is that their fixed-cost cell package could make a device like this possible, but I’m guessing it would still need to be comparatively large for the hardware and battery required, and comparatively costly too.

What all of the current solutions have in common is that they demand more of my time than I’m willing to give. I have plenty of frustrating issues to debug in my technical work, the last thing I want to do when I get home is deal with poorly documented setup workflows or change batteries more than once in a blue moon. I’m guessing that I’m not alone, every product I’ve seen that truly “just works” has had an order of magnitude more sales than a competitor that has even a bit of friction.

I would happily pay a lot for a device that I could stick on a garage door, scan a code on my phone that took me to a web URL where I could claim it (no more terrible phone apps, please) and then simply sent me a text if it was open for more than ten minutes. My sense is that the problems that need to be solved are around power consumption, radio, and cost. These aren’t areas I have expertise in, so I won’t be attempting this challenge, but I hope someone out there will, and soon.

A similar application is medication detection. I’m old enough to have my own pill organizer (don’t laugh too loud, it’s coming for you eventually) but an accelerometer attached to a pill bottle could tell if I’ve picked it up, and so presumably taken a dose, on time, and I’d never again have to measure out my tablets into little plastic slots. Devices like these do exist, but the setup, cost, and power consumption challenges are even higher, so they’re restricted to specialized use cases like clinical trials.

It feels like we’ve been on the verge of being able to build products like this for decades, but so many systems need to work smoothly to make the experience seamless that nothing has taken off. I really hope that the stars will align soon and I’ll be able to remove one or two little anxieties from my life!

A Personal History of ML Quantization

Tomorrow I’ll be giving a remote talk at the LBQNN workshop at ICCV. The topic is the history of quantization in machine learning, and while I don’t feel qualified to give an authoritative account, I did think it might be interesting to cover the developments I was aware of.

I don’t know if the talk will be recorded, but here are the slides in case they are useful for reference. Apologies for any mistakes, please do let me know so I can improve the presentation.