Five short links

Photo by Nugun

Brainstorm – Psychedelic raytraced graphics packed onto a display that can only show a tiny set of characters and colors. A beautiful hack.

YASIV – An interactive visualization of the connections between books on Amazon. I never found a good way to expose these sort of force-directed network graphs within a usable product, but I remain fascinated by them, they're a powerful way of communicating relationships between large numbers of objects.

Mining of Massive Datasets – Rich, detailed, and practical, this is an invaluable overview of the techniques that you can apply to big collections of unstructured data to produce useful information, and is freely available as a PDF. I'm looking forward to learning a lot from this book, I just wish I could pay them for it without ordering a hardback copy.

LoremPixel – Simple but handy service that auto-generates placeholder images for your design prototypes, with easy control over the size and category.

Map of the Drug War – Chilling and information-rich, this visualization of Mexico's violence shows how bitter the drug war has become.

Big Data keeps getting bigger in Boulder


A couple of years ago I started what became the Big Data meetup for the Boulder/Denver area, together with Jacob Rideout. The first few months were tough, despite having a tight-knit tech community in the area, not many people were using or interested in technologies like Hadoop and NoSQL, so we averaged around eight or nine people. After I left Colorado the event really started to pick up steam, as you can see in the graph above.

I like to think it wasn't my absence that fueled the growth, it's the ground-swell of interest in everything under the Big Data umbrella. Boulder is an exciting place to be working on technology, and I'm not at all surprised to see so much work being done with emerging data tools. There seem to be a lot of new (and old!) companies following in the footsteps of local pioneers like Gnip, Next Big Sound and Return Path, and they're looking for people to hire, so if you're an aspiring data geek who wants to work on interesting projects, I highly recommend popping along to the next one!

What I’ve learned from a thousand blog posts

Photo by En Tsai

This is my thousandth post here, and for me that's an important milestone. Years ago I read a study about abandoned blogs. They mentioned that most died after one or two posts, but one had reached a thousand before it went quiet. In the rough early days, that gave me a goal. If I was going to abandon my blog, I would at least beat that post record goddamnit!

Now I'm here, what did I learn during the last six years? 

Blogging is cathartic

I started blogging out of frustration. I felt trapped in my job, with nobody to talk to about the amazing possibilities I could see in the technology world. I was surprised to find that just typing out my thoughts was a big help, even when nobody responded. The process of organizing my thoughts made my internal struggles make more sense, left me feeling more at peace because it gave me a clearer view of the problems I was dealing with.

I still turn to my blog when I'm frustrated, and the funny thing is the posts that come out of that are often the most popular! Failure sucks, but instructs.

Having an audience is addictive

When I first started I would sit in Typepad's dashboard refreshing constantly in the hope of seeing a visitor. There were some days where nobody at all came to the blog. Now, on an average day between five hundred and a thousand people make it here. A fair number of those are for old posts (there are evidently still damned souls who have to write BHOs for Internet Explorer) but the long slog to build an audience makes me happy to see every single one of them. Knowing that people are actually paying attention to what I write is a heady drug, even at my modest scale.

I get to indulge my innate urge to pontificate without having to inflict it on my loved ones, and even get validation from comments and responses from people I admire. Behind every writer's cool exterior there's something pathetically vulnerable that craves attention and drives them on. Having experienced the thrill of seeing tens of thousands of people reading and discussing something I wrote, I can't imagine how anyone could avoid getting hooked.

Writing fast is a bloody useful skill

When I first started blogging, I set aside thirty minutes each weekday morning to write a post. I forced myself to publish whatever I had at the end of the half hour. This initially led to some awful blog posts, but luckily at that point I had no readers (see above). Over time I found it became easier to create something worthwhile with a tight deadline, and writing at speed was the most important skill blogging taught me. I could produce coherent writing before I blogged but it would take me three or four times as long.

Being able to organize my thoughts and type out an argument or explanation within a few minutes has allowed me to do things I'd never have time for otherwise. Creating documentation, replying to user emails, convincing colleagues, or pitching investors, it's amazing how much of my day is spent writing, and it all goes a lot more quickly thanks to my blogging practice.

Blogging is irrational

When people tell me they're thinking of writing a blog, I try to discourage them. By any sane measure the hours I've spent on this haven't had a great return on investment. The thing is, I can't help it! I need this outlet, and you should only be writing a blog if you've got the same screws loose as me, if you feel compelled.

I've enjoyed the last six years of blogging more than I can say, and as a final note I'd like to thank all of you for joining me in this long conversation. I've learned so much from everyone, and made some lifelong friends. I'm so grateful I had the chance to make so many wonderful connections, and I hope you'll join me for another thousand blog posts!

Five short links

Photo by Tanaka Juuyoh

strcpycat – How hard can it be to write a function to copy one string to another? This exploration shows how tough it is to create an algorithm that's truly generic. Seemingly-harmless design choices like returning the length of the source string will kill you when you're copying small chunks from a massive string.

Is there life on Venus? – We believe our own eyes, even when we shouldn't. This is a cautionary tale of a respected Russian astronomer who started to see life forms in the image-processing artifacts of old space missions. I used to generate 8×8 sprites for my 80's game programming by cycling through random blocks of pixels until something caught my eye, so I'm aware of how powerful pareidolia can be.

How to digitally sign a PDF – I can't believe I never knew you could do this, I've wasted so much time printing out and rescanning documents over the last few years of startups! In Lion it's so easy, you can just write your signature on a piece of white paper and hold it up to the camera, and after that just position it in any PDF you've loaded in Preview.

IMDB data set – Emphatically not free and open, but at least available, I'm intrigued by the Kevin Bacon possibilities here.

Computer Scientists and Google+: Something Interesting is Happening – As you may have noticed, I'm optimistic about Google+'s prospects. It comes down to my personal experiences, I'm discovering a lot of content that I just don't see on Facebook or Twitter, and it looks like I'm not the only one. 

Five short links

Photo by Miuenski

Fundamental Oracle flaw revealed – This is a fascinating piece of detective work on a bug, but also a cautionary tale of how even the most conservative assumptions can be proved wrong as data processing speeds and volumes grow.

Extracting structured data from Common Crawl – Shows exactly why I'm so excited by the potential of Common Crawl. Even just a list of all the hcard records from five billion web pages is going to be an amazing research resource, I have plans for doing fun things with the street addresses already.

Travel itineraries with long-exposure photos – I love the way the students used analog techniques to produce a high-tech looking visualizations.

Social Graph and Needlebase are dead – Google's API for publishing unified public profile information to developers never really caught on, but it's a shame to see it vanish. Needlebase's shutdown is less surprising, it always seemed likely to be useful more internally to Google, but I'm still sorry to see it lost to the outside world, it was a great tool.

The Apple logo in unicode – It's great to see how convoluted and political something as seemingly-simple as defining an international character set can be.

Jetpac now supports Google+

Photo by Eva Ho

Google+ has become very popular with photographers and hosts some amazing pictures, so I've been keen to help people discover the awesome travel ones through Jetpac. It took some head-scratching (the API is still in its very early stages) but you can now sign up using your Google account! Log in, and we'll give you inspiring photos from people you follow for wherever you're dreaming of traveling. It's been awesome discovering the wonderful content friends like Eva are putting out there, pictures I'd never have known about otherwise. I bet you'll find some treasures too!

Big Data war stories

TankPhoto by Mark Kelley

If you're in the Bay Area on February 8th, I highly recommend joining me at the Silicon Valley Big Data group's war stories event. It's being put on by some good friends from places like Kosmix (now Walmart Labs) and other folks who've been fighting in the Big Data trenches. The goal is to demystify the field, and show how any engineer can learn the techniques you need to create value from massive data sets, it's not just for Stanford PhD's any more! I hope that after hearing the stories and talking with the panelists, you'll feel confident you can dive in and start hacking.