Five short links

famousfive

Photo by Man’s Pic

The man who tried to build the Second Coming – I just visited Joachim Koester’s art exhibit on its last weekend at the YBCA, and while I was left cold by the actual experience, I loved the idea of designing a machine through a seance. As design methodologies go, it’s far from the strangest I’ve run across and I bet it would have a better success rate than your average design-by-committee. Joanne was fascinated by the concept too, and found me this off-beat blog post describing how 19th century spiritualists ended up trying to trance a whole new world of technology into being.

“It’s just a population map” – Andy Woodruff thinks we’ve taken the XKCD warning about population maps too much to heart.

Attempto Controlled English – With hindsight it’s obvious there would be a strictly-parseable and semantically-defined subset of our natural language out there, but it still feels like a peek into the future. Native English-speakers like me may feel lucky that our language is becoming the default, but I bet the need to communicate with machines is going to radically warp how we talk and write. We’ve already been trained to talk to Google in Searchese after all.

The mid-career crisis of the Perl programmer – An honest and insightful look back on a couple of decades of writing code.

The open-office trap – A huge pile of research indicating open offices should be considered harmful, undermining my own (conflicted) affection for them.

Writing code is not most programmers’ primary task

lonelychair

Photo by Brecht Soenen

I just read Nathan Marz’s argument against open plan offices, and this sentence leapt out at me as almost always wrong: “The primary task of a programmer is writing code, which involves sitting at a desk and thinking and typing“. I’ve been a big fan of Nathan’s since his Backtype days, but while his prescriptions make a lot of sense if you really are spending most of your time typing out new code based on clear requirements, I’ve found that a very rare position to be in. In my experience writing the code takes way less time than integrating and debugging it, let alone the open-ended process of figuring out what the requirements are. Most of my time goes on all those squishy bits. I look forward to the uninterrupted intervals when I get to just write code, but they’re not all that frequent.

When I started in games people who could write 3D software renderers were rare and highly-paid, but libraries and books came along that made their skills obsolete. Then it was folks who could program the fiendish DMA controllers on the Playstation 2, but soon enough they were sidelined too, followed by the game physics gurus, the shader hackers, and everyone else who brought only coding skills to the table. It turns out that we’re doing a pretty good job of making the mechanical process of writing code easier, with higher-level languages, better frameworks (something Nathan knows a lot about), and training that’s creating far more people who can produce programs. What I saw in games was that coding was the ticket that got me in the door, but improving all the other skills I needed as an engineer was what really helped me do a better job.

I learned to write a software renderer, but chatting with the artists who were building the models made me realize that I could make the game look far better and run much faster by building a tool in 3DS Max they could use to preview their work in-game. It reduced their iteration time from days to minutes, which meant they could try a lot more ways to reduce the polygon count without compromising the look. I would never have made this leap if I hadn’t been sitting in an open plan office where I could hear them cursing!

Since then I’ve seen the cycle repeat itself in every new industry I’ve joined. When the technology is new and hard to use, just knowing how to operate it gives you a high status position, but the tide of Moore’s Law and the spread of knowledge makes that a very temporary throne. The technical impediments always disappear, and graduates come out of college knowing what used to be elite skills. What keeps you in a job is the ability to be the interface between the precise requirements of software and the rest of the world filled with messy, contradictory and incompletely understood problems.

In passing Nathan mentions measuring productivity, but that’s one of the stickiest problems in software, with an inglorious history from counting lines of code to stack ranking. Most of my most useful contributions at the companies I’ve worked at have been when I’ve avoided producing what someone has asked me for, and instead given them what they need, which meant a lot of conversations to really understand the background. I also spend a lot of time passing on what I know to other people who are hitting problems, which hits my individual productivity but helps the team’s. The only meaningful level at which you can measure productivity is for the whole group tackling the problem, including the designers, QA, marketers, translators, and everyone else outside of coding who’s needed for almost any product you’re building. In almost every place I’ve worked, they would be able to make far more progress if they could interrupt the programmers a lot more, even though I’d hate it as an engineer!

I am actually ambivalent about open plan offices myself. Having my own room often seems like a delicious dream, and headphones are a poor alternative when I’m really diving deep into the code. What stops me from pushing for that is the knowledge that my job is all about collaboration, and an open arrangement truly does enable that. The interruptions aren’t getting in the way of my primary work, the interruptions are my primary work.

Five short links

fivekeys

Photo by Iain Watson

Accountability in a computerized society – An old but well-argued essay arguing that software engineers should be liable in a legal sense for bugs. As a coder the idea fills me with horror, but I know the idea will be pushed with increasing force as the world relies more and more on computers, so it’s worth reading just to prepare our counter-arguments.

Distributed neural networks with GPUs in the AWS cloud – After my recent post on switching to an in-house server, an Amazon engineer sent me this Netflix article. It contains some useful tips on particular problems they hit with virtualization and graphics cards, though I’m still happy with my Titan beast.

The gap between data mining and predictive models – Using the Facebook dating/posting story as a concrete example, John Mount demonstrates how a very visible correlation can still be completely unusable for any practical application. This is a problem I see again and again with my own work. When you’re looking at data in bulk about whole populations you’ll often see strong relationships that are overwhelmed by noise when you try to use them to say something about individuals.

Why you shouldn’t use Vagrant – Vagrant has been a fantastic tool for simplifying a mess of dependencies for me, especially as I’m developing on a Mac and deploying to Linux, but here’s someone who’s been left frustrated by the process. My read of Chad’s post is that he’s not hitting the problems Vagrant solves, and is paying the price for complexity he doesn’t need. My experience has been that I far prefer to set up dependencies once in Linux, where most of the frameworks I use are primarily tested and so have the least problems, rather than trying to install monsters like SciPy or SciKit in OS X on my laptop, and then figuring out how to do the same on my Ubuntu production servers.

Yann LeCun on deep learning – A good introduction to the recent resurgence of neural networks in AI, along with some pointed criticism of the singularity.

Five short links

watergon

Photo by Mark Carter

Deep learning galaxies – Deep belief networks are incredibly powerful scientific instruments, and it’s exciting to see their use spreading. Ryan Keisler’s using them to classify galaxy shapes, but I believe a large proportion of all problems where scientists need to measure properties from  images are now solvable with these new techniques.

Tricks for GPU-composited CSS – Some nifty hacks to keep your browser animations on the fast path.

How to commit fraud and get away with it – An offbeat but compelling argument that data systems in large corporations often exist to allow management to commit fraud with perfect deniability. When our decision-making machines are black boxes, who is to blame for their mistakes? If we can’t reason about why they’re doing what they’re doing, how do we stop people gaming them for their own benefit?

Using satellite imagery to track aid projects – This is a wonderful way to understand how big construction projects in the developing world are doing. I can’t wait until we have more ground-level public photography there too, Africa is about the only area of the world that doesn’t have large numbers of Instagram pictures from my analysis. Once we have those, all sorts of projects will be verifiable.

So, you finally have a woman on your team – Cate Huston keeps coming up with compelling and useful posts, and the practical advice in this one is priceless. “Things to look for: Ideas being repeated without credit. Judging women on past performance and men on ‘potential’“.

My in-house GPU-processing server

gpubeast

I spent the first half of my career chained to massive desktop machines, and I was so happy when I was finally able to completely switch to developing on my laptop. Once Amazon EC2 came along, and I could tap crazily-big servers on demand over a network, I never imagined I’d need to have a big box under my desk again. I’ve been surprised to discover there’s at least one niche that the cloud/laptop combination can’t fill though – heavy GPU computation.

I’ve found convolutional neural networks to be incredibly effective for image recognition, but teaching them can take days or weeks. It has become a major bottleneck in our development, and unfortunately Amazon’s GPU offerings are pretty sparse and expensive. The CG1 instances are based on 2010-era Nvidia chips, and so aren’t as speedy as they could be, and the G2 instances have newer GPUs, but are optimized for gaming and video applications rather than numeric processing. Since the CG1’s are $1,500 a month and slow, I was surprised to find it made sense to get an actual physical machine.

I spent some time researching what I needed and discovered that the official compute-focused Nvidia cards are painfully expensive. Happily some high-end consumer cards are known for giving excellent numerical performance for a lot less. The main difference is a lack of error-correction on their memory, and happily my calculations are robust to intermittent glitches.

I settled on the Nvidia Titan graphics card, and set out to find someone to build a machine around one. It’s been well over a decade since I built my last PC, so I knew I needed professional help. I settled on Xidax, they were well-reviewed and had a friendly process for setting up all the custom components I needed. The strangest part was that I found myself in an unfamiliar world of high-end gaming, with some impressive options for strange case lights and other gizmos I wasn’t expecting. I didn’t pick any of the custom effects, but you can see in the picture I still ended up with quite a pretty beast! During the build process they even emailed me progress photos, which was a neat touch.

They did a good job letting me know how the setup was going, and seemed to do a fair amount of pre-shipment testing, which was reassuring. The machine came within a week, and then the OS setup fun began. My first job was installing Ubuntu, since all of my software is Unix-based. I was hoping to keep Windows on there as an alternative, but the combination of partitions and a new-fangled EFI BIOS thwarted me. I currently have a machine that boots into the GRUB rescue prompt, and if I type exit, and then choose Ubuntu recovery mode, and pick choose ‘Continue as normal’ on the recovery screen, I can finally get into X windows. It’s not the most stable setup, and I’m hoping the new hard drive in the front of the picture will give me another drive to boot from, but it works! I’m able to run my training over twice as fast as I could on EC2, which makes a massive difference for development. I did have to do some wacky coolbits craziness to get the GPU fan to run above 55% speed, but with that in place processing cycles that took 35 seconds on the Amazon instances are down to only 14 seconds.

Five questions I ask myself before I write any code

leprechaunchecklist

 

Photo by John

Do I need to write it?

Is there another way I can accomplish the goal without writing any code? For example, I was about to write a script to send templated emails from a spreadsheet, but I realized that showing the non-engineer I was collaborating with how to do it themselves through Mailchimps UI would make both of us a lot happier.

Has someone else written it?

If I’m doing anything that feels at all computer-sciencey, the answer to this is almost certainly ‘Yes’! The only questions are whether I can find the code, and whether I have a license to use it. Even if I can’t find the code, there’s a good chance I’ll find helpful pointers about how to tackle the problem in blogs or inside open source projects on Github. I want to spend my time on what makes Jetpac different, not re-inventing the wheel, and what’s unique about us are our core image and data algorithms, and the application logic and design that deliver useful results to people. I try my damnedest to reuse open code for anything else, even if it’s not ideal and I have to contribute patches back, because otherwise the miscellaneous programming to handle things outside of our core business would eat all our resources and leave us moving at a glacial pace.

Do I understand why I’m writing it?

Every person I’ve worked with has wanted to be helpful, and bring me as fully-formed a solution as possible. Things go wrong when I mistake their suggested solution for the real requirements.  The only way I’ve found to know the difference is to understand the problem they’re facing, to learn the domain as deeply I can, to have as many water-cooler chats as possible with the folks on the coal face, and generally immerse myself in the background behind their request.

How will I know when I’m done?

The most painful conflicts I’ve had have been after I’ve delivered something that I thought was completed, and the person I’m giving it to was expecting parts I didn’t include. If I had to tell the truth, sometimes I knew we had different expectations at some level, but a desire to avoid conflict made me put off the difficult discussion as long as possible. Engineers have a lot of freedom, people ask us for things, pay us money, and then leave us to do incomprehensible rituals for weeks in the hope that we’ll give them what they want. If at the end we hand over something that’s not what they wanted, we’ve failed, even if it’s exactly what they asked for in a narrow sense.

The best solution is to describe in as much detail as you can what you’ll be delivering right at the start, with a particular focus on the tradeoffs you’re thinking you’ll make. I find ‘User stories‘ a great way to do this even if you’re not using a formal methodology that requires them, because they’re specific enough to be useful as engineering guides, but are in a language anyone can understand.

How will I debug it?

Writing the code is the easy part, it’s fixing the bugs that takes time. Any planning I’ve done up-front to make debugging easier has always paid off. Always. Guaranteed. Debugging isn’t glamorous so it tends to get less effort applied to it then it should, but as the biggest time-suck of the whole development process it’s always a good place to invest resources at the start. As a practical example, I’ve been doing a lot of GPU image processing recently, but graphics cards are terrible platforms to try to track down bugs on. I don’t even have the luxury of logging statements, let alone a proper debugger! To speed up development, I’ve actually been writing CPU reference implementations of all my algorithms first. When I do encounter a bug in the final GPU implementation, I can check to see if it occurs on the CPU too. If it does, I can debug it in a much saner environment and then port the fix over to the graphics card. If it doesn’t, I know the GPU must be doing something different, and I can dump buffers on both implementations until I’ve identified the exact stage where the results differ.

Other things you might think about at the planning stage are whether unit tests might be helpful, if specialized logging or other instrumentation frameworks makes sense, and whether there are any debugging tools you can add to your environment to help, for example automatic emails whenever your code hits an error in production.

Five short links

starfishlines

Picture by Matt Handler

Gallery of processor cache effects – Another demonstration of how strange and unintuitive modern processors can be. Until the start of the 19th century it was possible for a single person to have a useful understanding of the whole of current scientific knowledge, but science outgrew the capacity of a single human brain. Understanding how a complete computer stack works, from silicon gates to jQuery, is rapidly headed the same way.

Scientific method: Statistical errors – A great in-depth backing story to the Nature editorial on why scientists are hooked on bad statistics. What’s scary is that data folks are even worse. Any single number is a one-dimensional view of a complex reality, and if you work hard enough you can make any metric say what you want.

What wreck is this? – I recently discovered Daniel M Russell’s blog, and I’m hooked. He’s a search researcher at Google, and runs some great weekly challenges, throwing out a search problem and asking his readers to document how they solved it.

San Francisco’s class war, by the numbers – A good visual exploration of the Bay Area’s tech boom by Susie Cagle, with referenced data to back it up (click the speech bubbles). Told me a few things I didn’t know.

Graph structure in the web revisited – Demonstrates how important open resources like Common Crawl are. These researchers have created a map of the web’s structure, and all of their software and data is freely available to others. The web is a large part of our world, and we need a map of it that everyone can access. There’s more info over at Slashdot.

Advice for a lonely college student

I couldn’t help responding to “I’m a really lonely college student. What can I do?” on Hacker News, the author’s despair felt horribly familiar from my own university years. I don’t have any easy answers, but I wanted to offer what I could from all the stumbling around I’ve done, searching and slowly finding happiness and connection. Here’s what I came up with, if you have ideas too maybe you can add them to the HN thread so the original poster might see them?

———

I found college completely crushing, especially because I’d built up wild hopes for it as an escape from my unhappy childhood. I don’t have any easy answers, but here’s some of the tools that helped me:

Exercise. I know, it sounds dumb, especially when you’re depressed and have no energy, but it’s an incredibly effective way of hacking your brain chemistry. Run, swim, bike, hike, just pick one and pour all your frustration, anger, and sadness into it.

Get a hobby. I spent a lot of time worrying about being interesting to others, which guaranteed I wouldn’t be. Being interesting is a many-body problem with lots of unknowns, but being interested is way more solvable. You can figure out what you like a lot more easily than you can guess what might make other people like you. Don’t rule something out because it seems dorky, I guarantee you’ll find other people who enjoy it too, even if it’s Lego or collecting old maps. It’s surprisingly hard to understand what you actually want though, especially if you’ve been focused on what other people think.

Beware of magical transformations. I got married at 19, driven by an overwhelming desire to completely change my life to find happiness. It didn’t work. I also took a lot of drugs. That didn’t work either. I saw other people get pulled into cult-like religions or extreme political groups. Drastic exterior changes don’t alter who you are, you’ll still have the same problems, no matter what anyone tells you. Focus on boring incremental improvements, like exercise and hobbies. I hated that idea, because I was in love with my life being dramatic and the basic stuff seemed so mundane, but it’s what ended up making a lasting difference.

I doubt I’d have even listened to my present-day self when I was 19, but I hope there’s something in there that helps you. Life really does get better.

Five short links

fivestatues

Photo by Joe Baz

Tech’s untapped talent pool – I’m a massive fanboy of sociologists, they can reliably answer questions about human behavior in ways that are light-years ahead of most data analysis you see online. Data science’s big advantage is that we have massive new sources of information, and more data beats better algorithms, but I’m excited to see what happens when sociology’s algorithms meet the online world’s data!

ZIP codes are not areas – This one confused the hell out of me when I started getting serious about geo data, but the only true representation of ZIPs is as point clouds, where every building with an address is a point. The spatial patterns make drawing a boundary even for a single moment in time hard enough, but as houses are built and demolished, the layout changes in unexpected ways.

It’s hard not to leak timing information – A cautionary tale of how tough it can be to be sure even a simple function like a string comparison doesn’t give away useful information to a malicious user.

PLOS mandates data availability. Is this a good thing? – We all love open data and reproducible science, but there are hard practical problems around the mechanics of making big data sets available, ensuring they’ll be downloadable over the long term, and avoiding deanonymization attacks.

Better performance at lower occupancy – Processors are incredibly complicated beasts, and our simple mental models break down when we’re trying to squeeze the last drops of performance out of them. This is a great example of how even the manufacturers don’t understand how to best use their devices, as a Berkeley researcher demonstrates how to get far better performance from an Nvidia GPU than the documented best practices allow.

A walk around Andy Goldsworthy’s Presidio sculptures

A friend recently introduced me to Andy Goldsworthy’s work, through the Rivers and Tides documentary, so I was excited to see some of his ‘land art’ up close in San Francisco’s Presidio Park. The official site has some great background, but I couldn’t find a good guide to exploring all three of his scattered pieces, so here’s a quick rundown and map showing how I ended up navigating around. The hike itself is roughly two miles long, with well-maintained trails, and a few hundred feet of climbing but nothing too terrible.

goldsworthymap

Parking can be tough in the Presidio, but thankfully it was a rainy Super Bowl Sunday, so I found a spot in a small two-hour free parking section behind the Inn at the Presidio. I was actually originally aiming for the Inspiration Point parking lot, but that turned out to be closed for construction, so I was thankful to find something close to where I needed to be. There is plenty of paid parking nearer the Disney museum too, just a couple of blocks away, if you do get stuck.

There’s a trailhead and map at the parking lot, and from there I headed up the Ecology Trail, a reasonably steep fire road towards Inspiration Point. Once I reached the under-construction lot there, the view was beautiful, even on a wet day, looking out over Alcatraz and the bay. If you look away from the water, you should be able to see the top of Andy’s ‘Spire’ sculpture. As of February 2014, the construction made the normal trail to it inaccessible, so I ended up hiking a couple of hundred yards right along Arguello Boulevard, and then taking a use trail up to the main trail. It’s easy to navigate with the peak of the sculpture to guide you at least.

The piece itself is a tall narrow cone of unfinished tree trunks, all anchored deep in the ground and leaning in on each other. My first visit was at twilight, which gave it a very stark and striking silhouette, and it pays to find a spot where you can see it against the horizon, it’s hard to take it all in up close.

I then headed back to the Inspiration Point parking lot, and went back down to rejoin the Ecology trail, and continued along it almost to the edge of the park. I then followed the trail that parallels West Pacific Avenue all the way to the Lovers Lane bridleway. Just on the other side of Lovers Lane is the second Goldsworthy work, ‘Wood Line’. It’s a series of tree trunks with their barks stripped, arranged in a continuous snaking line for a thousand feet or so, starting and ending by disappearing into the earth. The look alone is very striking, but I also couldn’t resist the urge to walk along the whole length. I’d normally be horrified at the thought of clambering on public sculpture, but it didn’t feel like a bad way to interact with the work, it’s so open to the elements, and it forced me to look closely at it just to avoid slipping off!

woodline

 

Photo by Joanne Ladolcetta

Afterwards, I continued down Lovers Lane to its end at Presidio Boulevard, and then headed left along Barnard Avenue. There are a set of steps on the right that lead back to the parking lot, so I stopped by the car and dropped off my pack. The final piece is inside the old Powder Magazine, a couple of blocks away at the corner of Anza and Sheridan. I headed there by turning left along Moraga, and then right down Graham. The building itself is easy to spot, standing alone in the middle of the green near the Disney Museum, and right now is open 10am to 4pm on the weekend, and at other times by appointment.

‘Tree Fall’ is a giant eucalyptus fork, jammed into the roof of the 20 foot square building, with the tree and curved ceiling all covered in local clay that’s been allowed to crack naturally as it dries. The effect is like being inside a giant body, staring at arteries, especially as the only light is what comes in through the door way. The docent was able to give us some background too, apparently the piece is expected to stay for the next three or four years, and the binding they used for the clay was hair from a salon around the corner from my house. There’s apparently a new documentary coming too, and shows Andy’s children, who were young kids in the 2000 film, coming out to San Francisco to help assemble this piece.

Looking at all three works in the same day left me looking at the landscape of the Presidio a little differently, so I hope you get a chance to explore what he’s trying to do too.