Join me for a Deep Learning webcast tomorrow

July 23, 2014 By Pete Warden in Uncategorized 4 Comments

I’ve been having a lot of fun working with the O’Reilly team on ways to open up Deep Learning to a wider audience of developers. I’m working on a book tentatively titled “Deep Learning for Hackers” (since I’m a fan of Drew and John’s work), I’ve put up some introductory chapters as articles on the O’Reilly website, and tomorrow I’ll be doing an hour-long webcast walking through the basics of training and running deep networks with Caffe.

I hope you can join me, it’s a big help whenever I can collaborate with an audience to understand more about what developers need! If you do, I recommend downloading the Vagrant box ahead of time to make it easy to follow along. I look forward to seeing you there.

Five short links

July 15, 2014 By Pete Warden in Uncategorized Leave a comment

Photo by Gianni

Databending using Audacity effects – What happens when you apply audio effects to images? Occasionally-wonderful glitch art! It always lifts my heart to see creative tinkering like this, especially when it’s well-documented.

Scan Processor Studies – On the topic of glitch art, here are some mind-bending analog video effects from 1972. They look amazingly fresh, and I discovered them while following the influences behind the phenomenal Kentucky Route Zero, which features a lot of tributes to the Bronze Age of computing.

The crisis of reproducibility in omen reading – In which I cheer as the incomparable Cosma Shalizi takes on fuzzy thinking, again.

What’s next for OpenStreetMap? – Has an essential overview of the Open Database License virality problem that’s stopped me from working with OSM data for geocoding. David Blackman from Foursquare gave a heart-breaking talk describing all the work he’s put into cleaning up and organizing OSM boundaries for his open-source TwoFishes project, only to find himself unable to use it in production, or even recommend that other users adopt it!

Where is my C++ replacement? – As a former graphics engine programmer, I recognize the requirements and constraints in this lament. I want to avoid writing any more C or C++, after being scared straight by all the goto-fail-like problems recently, and Go’s my personal leading contender, but there’s still no clear winner.

Why engineers shouldn’t disdain frontend development

July 7, 2014 By Pete Warden in Uncategorized 1 Comment

Picture by Sofi

When I saw Cate Huston’s latest post, this part leapt out at me:

‘There’s often a general disdain for front-end work amongst “engineers”. In a cynical mood, I might say this is because they don’t have the patience to do it, so they denigrate it as unimportant.’

It’s something that’s been on my mind since I became responsible for managing younger engineers, and helping them think about their careers. It’s been depressing to see how they perceive frontend development as low status, as opposed to ‘real programming’ on crazy algorithms deep in the back end. In reality, becoming a good frontend developer is a vital step to becoming a senior engineer, even if you end up in a backend-focused role. Here’s why.

You learn to deal with real requirements

The number one reason I got when I dug into juniors’ resistance to frontend work was that the requirements were messy and constantly changing. All the heroic sagas of the programming world are about elegant systems built from a components with minimal individual requirements, like classic Unix tools. The reality is that any system that actually gets used by humans, even Unix, has its elegance corrupted to deal with our crazy and contradictory needs. The trick is to fight that losing battle as gracefully as you can. Frontend work is the boss level in coping with those pressures and will teach you how to engineer around them. Then, when you’re inevitably faced with similar problems in other areas, you’ll be able to handle them with ease.

You learn to work with people

In most other programming roles you get to sit behind a curtain like the Great and Powerful Wizard of Oz whilst supplicants come to you begging for help. They don’t understand what you’re doing back there, so they have to accept whatever you tell them about the constraints and results you can produce. Quite frankly it’s an open invitation to be a jerk, and a lot of engineers RSVP!

Frontend work is all about the visible results, and you’re accountable at a detailed level to a whole bunch of different people, from designers to marketing, even the business folks are going to be making requests and suggestions. You have nothing to hide behind, it’s hard to wiggle out of work by throwing up a smokescreen of jargon when it’s just changing the appearance or basic functionality of a page. You’re suddenly just another member of a team working on a problem, not a gatekeeper, and the power relationship is very different. This can be a nasty shock at first, but it’s good for the soul, and will give you vital skills that will stand you in good stead.

A lot of programmers who’ve only worked on backend problems find their careers limited because nobody wants to work with them. Sure, you’ll be well paid if you have technical skills that are valuable, but you’ll be treated like a troll that’s kept firmly under a bridge, for fear you’ll scare other employees. Being successful in frontend work means that you’ve learned to play well with others, to listen to them, and communicate your own needs effectively, which opens the door to a lot of interesting work you’d never get otherwise. As a bonus, you’re also going to become a better human being and have more fun!

You’ll be able to build a complete product

There are a lot of reasons why being full-stack is useful, but one of my favorites is that you can prototype a fully-working side-project on your own. Maybe that algorithm you’ve been working on really is groundbreaking, but unless you can build it into a demo that other people can easily see and understand, the odds are high it will just languish in obscurity. Being able to quickly pull together an app that doesn’t make the viewer’s eyes bleed is a superpower that will make everything else you do easier. Plus, it’s so satisfying to take an idea all the way from a notepad to a screen, all by yourself.

You’ll understand how to integrate with different systems

One of the classic illusions of engineers early in their career is that they’ll spend most of their time coding. In reality, writing new code is only a fraction of our job, most of the time will go into debugging, or getting different code libraries to work together. The frontend is the point at which you have to pull together all of the other modules that make up your application. That requires a wide range of skills, not the least of which is investigating problems and assigning blame! It’s the best bootcamp I can imagine in working with other people’s code, which is another superpower for any developer. Even if you only end up working as a solo developer on embedded systems, there’s always going to be an OS kernel and drivers you rely on.

Frontend is harder than backend

The Donald Knuth world of algorithms looks a lot like physics, or maths, and those are the fields most engineers think of as the hardest and hence the most prestigious. Just like we’ve discovered in AI though, the hard problems are easy, and the easy problems are hard. If you haven’t already, find a way to get some frontend experience, it will pay off handsomely. You’ll also have a lot more sympathy for all the folks on your team who are working on the user experience!

What does the future hold for deep learning?

July 6, 2014 By Pete Warden in Uncategorized 8 Comments

Photo by Pierre J.

When I chat to people about deep learning, they often ask me what I think its future is. Is it a fad, or something more enduring? What new opportunities are going to appear? I don’t have a crystal ball, but I have now spent a lot of time implementing deep neural networks for vision, and I’m also old enough to have worked through a couple of technology cycles. I’m going to make some ‘sophisticated wild-ass guesses’ about where I think things will head.

Deep learning eats the world

I strongly believe that neural networks have finally grown up enough to fulfil their decades-old promise. All applications that process natural data (speech-recognition, natural language processing, computer vision) will rely on it. This is already happening on the research side, but it will take a while to percolate fully through to the commercial sector.

Training and running a model will require different tools

Right now experimenting with new network architectures and train models is done with the same tools we use to run the models to generate predictions. To me, trained neural networks look a lot like compiled programs in a very limited assembler language. They’re essentially just massive lists of weights with a description of the order to execute them in. I don’t see any reason why the tools we use to develop them, which we use to change, iterate, debug, and train networks, should be used to execute them in production, with its very different requirements around interoperability with existing systems and performance constraints. I also think we’ll end up with small numbers of research-oriented folks who develop models, and a wider group of developers who apply them with less understanding of what’s going on inside the black box.

Traditional approaches will fight back

Deep learning isn’t the end of computer science, it’s just the current top dog. Millions of man-years have gone into researching other approaches to computer vision for example, and my bet is that once researchers have absorbed some of the lessons behind deep learning’s success (eg use massive numbers of training images, letting the algorithm pick the features), we’ll see better versions of the old algorithms emerge. We might even see hybrids, for example I’m using an SVM as the final layer of my network to enable fast retraining on embedded devices.

There will be a gold-rush around production-ready tools

Deep learning eating the world means rapidly growing demand for new solutions, as it spreads from research into production. The tools will need to fit into legacy ecosystems, so things like integration with OpenCV and Hadoop will become very important. As they get used at large scale, the power and performance costs of running the networks will become a lot more important, as opposed to the raw speed of training that current researchers are focused on. Developers will want to be able to port their networks between frameworks, so they can use the one that has the right tradeoffs for their requirements, rather than being bound to whatever system they trained the model on as they are right now.

What does it all mean?

With all these assumptions, here’s where I think we’re headed. Researchers will focus on expanding and improving the current crop of training focused libraries and IDEs (Caffe, Theanos). Other developers will start producing solutions that can be used more widely. They’ll be able to compete on ease-of-use, performance (not just raw speed, but also power consumption and hardware costs), and which environments they run in (language integration, distributed systems support via Hadoop or Spark, embedded devices).

One of the ways to improve performance is with specialized hardware, but there are some serious obstacles to overcome first. One of them is that the algorithms themselves are in flux, I think there will be a lot of changes over the next few years, which makes solidifying them into chips hard. Another is that almost all the time in production systems is spent doing massive matrix multiplies, which existing GPUs happen to be great at parallelizing. Even SIMD instructions on ordinary CPUs are highly effective at giving good performance. If deep networks need to be run as part of larger systems, the latency involved in transferring between the CPU and specialized hardware will kill speed, just as it does with a lot of attempts to use GPUs in real-world programs. Finally, a lot of the interest seems to be around encoding the training process into a chip, but in my future, only a small part of the technology world trains new models, everyone else is just executing off-the-shelf networks. With that all said, I’m still fascinated by the idea of a new hardware approach to the problem. Since I see neural networks as programs, building chips to run them is very tempting. I’m just wary that it may be five or ten years before they make commercial sense, not two or three.

Anyway, I hope this gives you some idea of how things look from my vantage point. I’m excited to be involved with a technology that’s got so much room to grow, and I can’t wait to see where it goes from here!

Five short links

July 1, 2014 By Pete Warden in Uncategorized 1 Comment

Photo by <rs> snaps

Crossing the great UX/Agile divide – Methodologies are driven by human needs, and I know from personal experience that agile development takes power from designers and gives it to engineers. That doesn’t mean it’s wrong, but admitting there’s a power dynamic there makes it at least possible to talk about it candidly. “Although many software developers today enjoy the high salaries and excellent working conditions associated with white-collar work, it may not stay that way and UX design could be a contributing factor.”

Eigenmorality – A philosophical take on how PageRank-like algorithms could be used to tackle ethical dilemmas, featuring Eigenmoses and Eigenjesus.

The elephant was a Trojan horse – I almost always use Hadoop as a simple distributed job system, and rarely need MapReduce. I think this eulogy for the original approach captures a lot of why MapReduce was so important as an agent of change, even if it ended up not being used as much as you’d expect.

Neural networks, manifolds, and topology – There are a lot of important insights here. One is the Manifold Hypothesis, which essentially says there are simpler underlying structures buried beneath the noise and chaos of natural data. Without this, machine learning would be impossible, since you’d never be able to generalize from a set of examples to cope with novel inputs, there would be no pattern to find. Another is that visual representations of the problems we’re tackling can help make sense of what’s actually happening under the hood.

The polygon construction kit – Turns 3D models into instructions for building them in the real world. It’s early days still, but I want this!

Pete Warden, US Citizen!

June 11, 2014 By Pete Warden in Uncategorized 5 Comments

I’m very proud and excited to be taking my oath of allegiance this morning, the final step to becoming a US citizen after thirteen years of calling this country my home. To mark the occasion, my girlfriend Joanne wanted to interview me to answer some pressing questions about exactly why I still can’t pronounce “Water” correctly!

Why is everyone so excited about deep learning?

June 10, 2014 By Pete Warden in Uncategorized 2 Comments

Photo by Narisa

Yesterday a friend emailed, asking “What’s going on with deep learning? I keep hearing about more and more companies offering it, is it something real or just a fad?“. A couple of years ago I was very skeptical of the hype that had emerged around the whole approach, but then I tried it, and was impressed by the results I got. I still try to emphasize that they’re not magic, but here’s why I think they’re worth getting excited about.

They work really, really well

Neural networks have been the technology-of-the-future since the 1950’s, with massive theoretical potential but lacklustre results in practice. The big turning point in public perception came when a deep learning approach won the equivalent of the World Cup for computer vision in 2012. Just look at the results table, the Super Vision team, Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, absolutely trounced their closest competitors. It wasn’t a fluke, here’s a good overview of a whole bunch of other tasks where the approach is either beating more traditional approaches or providing comparable results. I can back this up with my own experience, and they’ve consistently won highly-competitive Kaggle competitions too.

They’re general-purpose

I’m focused on computer vision, but deep neural networks have already become the dominant approach in speech recognition, and they’re showing a lot of promise for making sense of text too. There’s no other technique that applies to so many different areas, and that means that any improvements in one field have a good chance of applying to other problems too. People who learn how to work with deep neural nets can keep re-using that skill across a lot of different domains, so it’s starting to look like a valuable foundational skill for practical coders rather than a niche one for specialized academics. From a research perspective it makes the approach worth investing in too, because they show a lot of promise for tackling a wide range of topics.

They’re higher-level

With neural networks you’re not telling a computer what to do, you’re telling it what problem to solve. I try to describe what this means in practice in my post about becoming a computer trainer, but the key point is that the development process is a lot more efficient once you hand over implementation decisions to the machine. Instead of a human with a notebook trying to decide whether to look for corners or edges to help spot objects in images, the algorithm looks at a massive number of examples and decides for itself which features are going to be useful. This is the kind of radical change that artificial intelligence has been promising for decades, but has seldom managed to deliver until now.

There’s lots of room for improvement

Even though the Krizhevsky approach won the 2012 Imagenet competition, nobody can claim to fully understand why it works so well, which design decisions and parameters are most important. It’s a fantastic trial-and-error solution that works in practice, but we’re a long way from understanding how it works in theory. That means that we can expect to see speed and result improvements as researchers gain a better understanding of why it’s effective, and how it can be optimized. As one of my friends put it, a whole generation of graduate students are being sacrificed to this effort, but they’re doing it because the potential payoff is so big.

I don’t want you to just jump on the bandwagon, but deep learning is a genuine advance, and people are right to be excited about it. I don’t doubt that we’re going to see plenty of other approaches trying to improve on its results, it’s not going to be the last word in machine learning, but it has been a big leap forward for the field, and promises a lot more in years to come.

Deep learning on the Raspberry Pi!

June 9, 2014 By Pete Warden in Uncategorized 22 Comments

Photo by Clive Darra

I’m very pleased to announce that I’ve managed to port the Deep Belief image recognition SDK to the Raspberry Pi! I’m excited about this because it shows that even tiny, cheap devices are capable of performing sophisticated computer vision tasks. I’ve talked a lot about how object detection is going to be commoditized and ubiquitous, but this is a tangible example of what I mean, and I’ve already had folks digging into some interesting applications; detecting endangered wildlife, traffic analysis, satellites, even intelligent toys.

I can process a frame in around three seconds, largely thanks to heavy use of the embedded GPU for heavy lifting on the math side. I had to spend quite a lot of time writing custom assembler programs for the Pi’s 12 parallel ‘QPU’ processors, but I’m grateful I could get access at that low a level. Broadcom only released the technical specs for their graphics chip in the last few months, and it’s taken a community effort to turn that into a usable set of examples and compilers. I ended up heavily patching one of the available assemblers to support more instructions, and created a set of helper macros for programming the DMA controller, so I’ve released those all as open source. I wish more manufacturers would follow Broadcom’s lead and give us access to their GPUs at the assembler level, there’s a lot of power in those chips but it’s so hard to tune algorithms to make use of them without being able to see how they work.

Download the library, give it a try, and let me know about projects you use it on. I’m looking forward to hearing about what you come up with!

How I teach computers to think

June 6, 2014 By Pete Warden in Uncategorized 7 Comments

Photo by Kit

Yesterday I was suddenly struck by a thought – I used to be a coder, now I teach computers to write their own programs. With the deep belief systems I’m using for computer vision, I spend most of my time creating an environment that allows the machines to decide how they want to solve problems, rather than dictating the solution myself. I’m starting to feel a lot more like a teacher than a programmer, so here’s what it’s like to teach a classroom of graphics cards.

Curriculum

I have to spend a lot of time figuring out how to collect a large training set of images, which have to represent the kind of pictures that the algorithm will be likely to encounter. That means you can’t just re-use photos from cell phones if you’re targeting a robotics application. The lighting, viewing angles, and even the ‘fisheye’ geometry of the lens all have to be consistent with what the algorithm will encounter in the real world or you’ll end up with poor results. I also have to make sure the backgrounds of the images are as random as possible, because if the objects I’m looking for always occur in a similar setting in the training, I’ll end up detecting that rather than the thing I actually care about.

Another crucial step is deciding what the actual categories I’m going to recognize are. They have to be the kind of thing that’s quite different between images, so separating cats from dogs is more likely to work than distinguishing American from British short-hair cat breeds. There are often edge cases too, so to get consistent categorization I’ll spend some time figuring out rules. If I’m looking for hipsters with mustaches, how much stubble does somebody need on their upper lip before they count? What if they have a mustache as part of a beard?

Once I’ve done all that, I have to label at least a thousand images for each category, with often up to a million images in total. This means designing a system to capture likely images from the web or other sources, with a UI that lets me view them rapidly and apply labels to any that fall into a category I care about. I always start by categorizing the first ten thousand or so images myself so I can get a feel for how well the categorization rules work, and what the source images are like overall. Once I’m happy the labeling process works, I’ll get help from the rest of our team, and then eventually bring in Mechanical Turks to speed up the process.

Instruction

One advantage I have over my conventional teacher friends is that I get to design my own students! This is one of the least-understood parts of the deep learning process though, with most vision solutions sticking pretty close to the setup described in the original Krizhevsky paper. There are several basic components that I have to arrange in a pipeline, repeating some of them several times with various somewhat-arbitrary transformations in between. There are a lot of obscure choices to make about ordering and other parameters, and you won’t know if something’s an improvement until after you’ve done a full training run, which can easily take weeks. This means that, as one of my friends put it, we have an entire generation of graduate students trying to find improvements by trying random combinations in parallel. It’s a particularly painful emulation of a genetic algorithm since it’s powered by consuming a chunk of people’s careers, but until we have more theory behind deep learning, the only way to make progress is by using architectures that have been found to work in the past.

The training process itself involves repeatedly looping through all of the labeled images, and rewarding or punishing the neural connections in your network depending on how correctly they respond to each photo. This process is similar to natural learning, as more examples are seen the system starts to understand more about the patterns they have in common and the success rate increases. In practice deep neural networks are extremely fussy learners though, and I spend most of my time trying to understand why they’re bone-headedly not improving when they should be. There can be all sorts of problems; poorly chosen categories, bad source images, incorrectly classified objects, a network layout that doesn’t work, or bugs in the underlying code. I can’t ask the network why it’s not learning, we just don’t have good debugging tools, so I’ll usually end up simplifying the system to eliminate possible causes and try solutions more quickly than I could with a full run.

Training can take a long time for problems like recognizing the 1,000 Imagenet categories, on the order of a couple of weeks. At any point the process can spiral out of control or hit a bug, so I have to check the output logs several times a day to see how they’re doing. My girlfriend has became resigned to me tending ‘the brain’ in the corner of our living room in breaks between evening TV. Even if nothing’s gone dramatically wrong, several of the parameters need to be changed as the training process progresses to keep the learning rate up, and knowing when to make those changes is much more of an art than a science.

Finals

Once I’ve got a model fully trained, I have to figure out how well it works in practice. You might think it would be easy to evaluate a computer, they don’t have all the human problems of performance anxiety or distraction, but this part can actually be quite tough. As part of the training process I’m continually running numerical tests on how many right and wrong answers the system is giving, but, like standardized tests for kids, these only tell part of the story. One of the big advantages I’ve found with deep learning systems is that they make more understandable mistakes than other approaches. For example, users are a lot more forgiving if a picture of a cat is mis-labeled as a racoon, than if it’s categorized as a coffee cup! That kind of information is lost when we boil down performance into a single number, so I have to dive deeper.

The real test is building the model into an application and getting it in front of users. Often I’ll end up tweaking the results of the algorithm based on how I observe people reacting to it, for example suppressing the nematode label because it’s the default when the image is completely black. I’ll often spot more involved problems that require changes at the training set level, which will require another cycle through the whole process once they’re important enough to tackle.

As you can see being a computer trainer is a strange job, but as we get better at building systems that can learn, I bet it’s going to be increasingly common. The future may well belong to humble humans who work well with intelligent machines.

Five short links

May 29, 2014 By Pete Warden in Uncategorized Leave a comment

Picture by H. Michael Karshis

The spread of American slavery – A compelling use of animated maps to get across the fact that slavery was spreading and dominating the places it existed, right up until the Civil War. A map that matters, because it punctures the idea that slavery would have withered away naturally without intervention from the North.

Snapchat and privacy and security consent orders – On the surface FTC consent orders look pretty toothless, so why do companies worry about them so much? This article does a good job of what they mean in practice, and it looks like they operate as jury-rigged regulations tailored for individual corporations, giving the FTC wide powers of oversight and investigation. The goals are often noble, but the lack of consistency and transparency leaves me worried the system is ineffective. If these regulations only apply to companies who’ve been caught doing something shady, then it just encourages others to avoid publicity around similar practices to stay exempt from the rules.

Maze Tree – I have no idea what the math behind this is, but boy is it pretty!

A suicide bomber’s guide to online privacy – The ever-provocative Peter Watts pushes back on David Brin’s idea of a transparent society by reaching into his biology training. He makes a convincing case that the very idea that someone is watching you is enough to provoke fear, in a way that’s buried deep in our animal nature. “Many critics claim that blanket surveillance amounts to treating everyone like a criminal, but I wonder if it goes deeper than that. I think maybe it makes us feel like prey. ”

Data-driven dreams – An impassioned rant against the gate-keeping that surrounds corporate data in general, and the lack of access to Twitter data for most research scientists in particular. Like Craigslist, Twitter messages feel like they should be a common resource since they’re public and we created them, but that’s not how it works.

	Moonshine Voice v2 v… on Announcing Moonshine Voice
	Pete Warden on Launching a free, open-source,…
	riddelln on Launching a free, open-source,…
	I see dead people. Y… on Announcing Moonshine Voice
	Pete Warden: Announc… on Announcing Moonshine Voice

Pete Warden's blog

Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better.

Category Archives: Uncategorized

Join me for a Deep Learning webcast tomorrow

Five short links

Why engineers shouldn’t disdain frontend development

What does the future hold for deep learning?

Five short links

Pete Warden, US Citizen!

Why is everyone so excited about deep learning?

Deep learning on the Raspberry Pi!

How I teach computers to think

Five short links