Los Angeles web ventures barbecue

Photo by SpacePotato

If you’re a tech entrepreneur in LA, come along to the first meeting of the local web ventures group. It’s on Saturday June 14th at 1:00pm in Sherman Oaks, organized by Wil Fernandez. There’s all the practical benefits of networking, but the real point for me is to be around a bunch of people driven by the same passion. I’ve made it to too few EMS events, (Saturday mornings are usually booked for trailwork) but I always walk out fired up by the determination of both the students and presenters to Get Stuff Done.

I was looking at restarting the apparently moribund LA open coffee club, but whilst researching this article found that it’s alive and well, just not covered on the main site. The joys of a fragmented web. I just missed a meeting yesterday, but I’ll be making it along to the next. If there’s any other SoCal entrepreneur events I’m missing, let me know in the comments and I’ll check them out.

Eat at Husky’s in West Seattle


If you’re ever in West Seattle, check out Husky delicatessen for some gorgeous sandwiches. We were staying near California Avenue this week visiting family, and wandered in by chance to pick up some lunch. After trying their Chicken Cashew on Sourdough, we ended up coming back again for the next three days! They’re a family-owned store with a long history, and a great selection of custom-made candy and ice cream along with the sandwiches. The servers are all on the ball, ready to help you pick through all their choices, and happy to give you two half-sandwiches for the price of a whole. They use fresh, local ingredients, and it shows in the taste.


I wasn’t active enough to justify ice cream, but I was enviously checking out everyone else’s, and it looked soft and creamy.


I’m getting hungry just remembering Husky’s, I think I need to pick up some brunch…

Why can’t you create a calendar from your email?

Photo by Churl

I’m very excited about the potential for email as a data source. I’m so passionate about it, it’s hard to remember that for most people it’s a new idea, and its not obvious how it could be useful. To explain, I usually point out existing companies like Spoke or Contact Networks that pull out basic contact information, or Microsoft’s Knowledge Network experiment that automatically locates experts. But what truly gets my heart racing are all the applications that haven’t been possible without easy access to email data.

One very promising area is extracting events from messages. Dates are one of the easier entities to spot in unstructured language. PHP has a built-in function, strtotime(), that can convert most English time strings into an absolute value, even fuzzy ones like "next Thursday" or "now". Getting the rest of the information like the name of an event is tougher, but imagine a calendar view that just shows the subject line of each email at any time that’s mentioned in the body of the message. You could restrict the view to only genuine contacts (people you’d replied to at a minimum) and then with a single click transfer any true events to your appointments calendar.

So why isn’t this already implemented? Gmail has got something similar for its Gcal integration, but it’s very limited in the formats it will recognize. There’s articles out there like Learned Automatic Recognition of Appointments from Email by Lauren Paone at CMU, but as a quote from the paper puts it "Although email is ubiquitous, large and realistic email corpora are rarely available for research purposes." Lauren faced serious obstacles even running realistic tests because he didn’t have enough email to work with.

What’s stopping progress is the mind-numbing pain of first getting data to prototype with (though the Enron corpus makes that somewhat easier), but even worse, trying to integrate with any email service like Outlook/Exchange or through IMAP. Innovation is rapid in the web world because anybody can spider the public internet and offer the results through a website. A few large companies like Google and Yahoo have access to their user’s emails and so can create email-based tools, and if you can persuade users to install a desktop plugin you can do the same, but the only way to move things forward is to open up access to a lot more developers. I’ll be posting on some of my efforts in that direction soon.

You prefer lovable fools to competent jerks

Photo by Cemetery Belle

That’s the argument of this Harvard Business Review article by Tiziana Casciaro and Miguel Sousa Lobo. They’ve researched how collaboration networks actually form within organizations, trying to work out how people choose who to connect with to get a task done. They tried to measure two attributes, competence and likableness, and then look at how those measures relate to who you decide to work with. Based on the combinations of likableness and competence, they classify people into incompetent jerks, competent jerks, lovable fools and lovable stars.

Two corners of the matrix are obvious, nobody wants to work with somebody who’s bad at their job and has no personal skills, and everybody is happy to work with a superstar who’s also a nice person. The surprising part is that whilst most people claim to prefer competent jerks to lovable fools as work partners, in practice they choose people they like regardless of their competence.

They talk about some of the consequences of this, that you tend to have more homogeneous groups of people working together with less diversity of viewpoints, since people tend to like others who are similar to themselves. They also informally talk about the mechanism that drives people to prefer likable fools over more competent but grating alternatives. They mention trust and familiarity, but it would be interesting to see how much correlation there is with the network measure of closure within a group. It seems likely that you share a lot of mutual friends with likable people, since by definition a lot of people like them, and so the reputation cost of letting you down will be a lot higher for them. Competent jerks won’t have those same third-party ties.

Based on my experience I avoid anyone who’s a real jerk purely because they also tend to be unreliable in delivering results. There’s only a theoretical distinction between someone who can’t do a task, and someone who can but won’t, and I think that managers overestimate their ability to change a jerk into someone productive, and underestimate the damage jerks do to their peers. I love Bob Sutton’s work with the No Asshole Rule looking at the
impacts of jerks in the workplace, and how to spot and deal with them.

I do think the matrix above is incomplete though, there’s a large group of employees who aren’t widely liked, but aren’t jerks either, they’re just socially disconnected from their colleagues. They’re often the bedrock of the team, quietly getting work done. These are the people that management can really help, by acting as an interface between them and the outside world, protecting them from perceived hassle and distilling the competing external demands into simpler requirements.

You’ll need to pay $7 to get the full document, but the summary gives you a good overview. There’s a free technical paper that’s aimed at an academic audience, the article itself is focused on practical lessons you can draw from their research. The work relies on the standard self-reporting surveys to figure out networks, as always I’d be fascinated to see if automated data-mining techniques on email and phone usage within a company gives the same picture.

Why use email as an interface?

Photo by VoxPhoto

There’s some great examples out there of using email as the gateway to a service. I Want Sandy is a fantastic automated personal assistant that you drive entirely through email. You send emails containing natural language details of your events and lists, and you get back timely reminders and updates. Posterous lets you email files and documents directly to a website, with an incredibly streamlined interface.

So why do they use email as an interface, rather than the web?

Everybody can email. You don’t have to teach anyone a new web interface. You type in a mail, chose an address and hit send.

Mail programs make great content. You can easily attach files, add text styles and include photos. If I forget and hit Command-B in Firefox while I’m writing a blog post, my text doesn’t get bolded, I just get to see the bookmarks sidebar. Email programs get this right, they give you drag and drop, hot-keys and let you create good-looking documents easily.

Email is everywhere. Sure, most devices also have the web, but they usually have a much better UI for mail.

Email contains everything. Outlook is the center of most professional lives, and personal email already has most of the information, files and pictures you want to share. Being able to do interesting things with all of that without stepping outside of your mail service is really convenient. All of your history with any service is stored in the same place you keep everything else.

So how can you tap into that power? I don’t know what Sandy and Posterous are using, but GoodServer looks like an intriguing solution. It’s a Java library that implements an IMAP server that you can then plug your custom application logic into. They’ve got good documentation, a free evaluation copy, and it’s been battle-tested by a lot of commercial outfits.

Cross-platform Exchange connectivity with Moonrug

Photo by Melissa Morano

Thanks to a Gmail ad I recently discovered Moonrug Software. They offer a Java-based library that uses the MAPI network protocol to interface with any Exchange server. This is the same way that Outlook connects to Exchange, so it has the potential to support everything Outlook has access to, including calendar and contact information. This makes it a lot more comprehensive than basic email protocols like IMAP.

I’ve exchanged a few emails with Moonrug’s founder, and they’re still rolling out their full package, but they have recently released a sample demonstrating synchronization with Exchange. It’s good to see someone figuring out a cornerstone of the Exchange connectivity puzzle. Traditionally Microsoft has tried to maintain a competitive advantage by keeping it’s mail ecosystem as closed as possible. In theory that’s changing with the new Windows Open Protocols initiative. In practice they’ve not yet got around to releasing the really juicy details of things like the MAPI network protocol, so you’re stuck trying to reverse engineer them instead. Moonrug have been working on that approach for the last couple of years, long before the protocol initiative was announced.

Their product should be a great alternative to trying to do the same yourself, helping to open up the Exchange world to some real innovations.

How to fix illegal character errors in PHP XML parsing

Photo by Intimaj

I’m still plagued by occasional failures in my XML parsing due to illegal characters. Explicitly setting the character encoding reduced the frequency, but they’re still popping up occasionally. I have a couple of techniques I’ve tried. One is to use iconv() to strip out any illegal characters for the set I’m using, eg

$output = iconv("ISO-8859-1", "ISO-8859-1//IGNORE", $input);

This apparently works with more complex unicode sets, but at the moment I’m sticking with an 8 bit character encoding. The problem is that all values correspond to a defined character in ISO-8859-1. It took some head-scratching to realize that ISO-8859-1 is not the same as ISO 8859-1! The extra hyphen after ISO denotes an extended version that includes values in the range 0x00 to 0x1f, 0x7f and 0x80 to 0x9f. This fills up the range of mapped values, so that any number between 0 and 255 corresponds to a valid character in ISO-8859-1, and the line above does nothing.

So, in theory that will fix Unicode encodings, but I need something that will handle the characters that are valid in ISO-8859-1 but that aren’t allowed by the XML spec. These are the control characters in the range 0x00 to 0x1f, and 0x7f. To replace these you can run a regular expression that looks something like this:


I actually had a large file on disk that I wanted to change, so I actually used sed and its control character class shorthand:

sed ‘s/[[:cntrl:]]//g’ messages.xml > messages.xml.fixed

This solved the illegal character error I was hitting. Now I’m hitting "XML error: EntityRef: expecting ‘;’ at line 451837", and inspection of the text hasn’t helped me figure out what’s wrong yet. At least I’ve got a lot further through the file.