Los Angeles web ventures barbecue

Photo by SpacePotato

If you’re a tech entrepreneur in LA, come along to the first meeting of the local web ventures group. It’s on Saturday June 14th at 1:00pm in Sherman Oaks, organized by Wil Fernandez. There’s all the practical benefits of networking, but the real point for me is to be around a bunch of people driven by the same passion. I’ve made it to too few EMS events, (Saturday mornings are usually booked for trailwork) but I always walk out fired up by the determination of both the students and presenters to Get Stuff Done.

I was looking at restarting the apparently moribund LA open coffee club, but whilst researching this article found that it’s alive and well, just not covered on the main site. The joys of a fragmented web. I just missed a meeting yesterday, but I’ll be making it along to the next. If there’s any other SoCal entrepreneur events I’m missing, let me know in the comments and I’ll check them out.

Eat at Husky’s in West Seattle


If you’re ever in West Seattle, check out Husky delicatessen for some gorgeous sandwiches. We were staying near California Avenue this week visiting family, and wandered in by chance to pick up some lunch. After trying their Chicken Cashew on Sourdough, we ended up coming back again for the next three days! They’re a family-owned store with a long history, and a great selection of custom-made candy and ice cream along with the sandwiches. The servers are all on the ball, ready to help you pick through all their choices, and happy to give you two half-sandwiches for the price of a whole. They use fresh, local ingredients, and it shows in the taste.


I wasn’t active enough to justify ice cream, but I was enviously checking out everyone else’s, and it looked soft and creamy.


I’m getting hungry just remembering Husky’s, I think I need to pick up some brunch…

Why can’t you create a calendar from your email?

Photo by Churl

I’m very excited about the potential for email as a data source. I’m so passionate about it, it’s hard to remember that for most people it’s a new idea, and its not obvious how it could be useful. To explain, I usually point out existing companies like Spoke or Contact Networks that pull out basic contact information, or Microsoft’s Knowledge Network experiment that automatically locates experts. But what truly gets my heart racing are all the applications that haven’t been possible without easy access to email data.

One very promising area is extracting events from messages. Dates are one of the easier entities to spot in unstructured language. PHP has a built-in function, strtotime(), that can convert most English time strings into an absolute value, even fuzzy ones like "next Thursday" or "now". Getting the rest of the information like the name of an event is tougher, but imagine a calendar view that just shows the subject line of each email at any time that’s mentioned in the body of the message. You could restrict the view to only genuine contacts (people you’d replied to at a minimum) and then with a single click transfer any true events to your appointments calendar.

So why isn’t this already implemented? Gmail has got something similar for its Gcal integration, but it’s very limited in the formats it will recognize. There’s articles out there like Learned Automatic Recognition of Appointments from Email by Lauren Paone at CMU, but as a quote from the paper puts it "Although email is ubiquitous, large and realistic email corpora are rarely available for research purposes." Lauren faced serious obstacles even running realistic tests because he didn’t have enough email to work with.

What’s stopping progress is the mind-numbing pain of first getting data to prototype with (though the Enron corpus makes that somewhat easier), but even worse, trying to integrate with any email service like Outlook/Exchange or through IMAP. Innovation is rapid in the web world because anybody can spider the public internet and offer the results through a website. A few large companies like Google and Yahoo have access to their user’s emails and so can create email-based tools, and if you can persuade users to install a desktop plugin you can do the same, but the only way to move things forward is to open up access to a lot more developers. I’ll be posting on some of my efforts in that direction soon.

You prefer lovable fools to competent jerks

Photo by Cemetery Belle

That’s the argument of this Harvard Business Review article by Tiziana Casciaro and Miguel Sousa Lobo. They’ve researched how collaboration networks actually form within organizations, trying to work out how people choose who to connect with to get a task done. They tried to measure two attributes, competence and likableness, and then look at how those measures relate to who you decide to work with. Based on the combinations of likableness and competence, they classify people into incompetent jerks, competent jerks, lovable fools and lovable stars.

Two corners of the matrix are obvious, nobody wants to work with somebody who’s bad at their job and has no personal skills, and everybody is happy to work with a superstar who’s also a nice person. The surprising part is that whilst most people claim to prefer competent jerks to lovable fools as work partners, in practice they choose people they like regardless of their competence.

They talk about some of the consequences of this, that you tend to have more homogeneous groups of people working together with less diversity of viewpoints, since people tend to like others who are similar to themselves. They also informally talk about the mechanism that drives people to prefer likable fools over more competent but grating alternatives. They mention trust and familiarity, but it would be interesting to see how much correlation there is with the network measure of closure within a group. It seems likely that you share a lot of mutual friends with likable people, since by definition a lot of people like them, and so the reputation cost of letting you down will be a lot higher for them. Competent jerks won’t have those same third-party ties.

Based on my experience I avoid anyone who’s a real jerk purely because they also tend to be unreliable in delivering results. There’s only a theoretical distinction between someone who can’t do a task, and someone who can but won’t, and I think that managers overestimate their ability to change a jerk into someone productive, and underestimate the damage jerks do to their peers. I love Bob Sutton’s work with the No Asshole Rule looking at the
impacts of jerks in the workplace, and how to spot and deal with them.

I do think the matrix above is incomplete though, there’s a large group of employees who aren’t widely liked, but aren’t jerks either, they’re just socially disconnected from their colleagues. They’re often the bedrock of the team, quietly getting work done. These are the people that management can really help, by acting as an interface between them and the outside world, protecting them from perceived hassle and distilling the competing external demands into simpler requirements.

You’ll need to pay $7 to get the full document, but the summary gives you a good overview. There’s a free technical paper that’s aimed at an academic audience, the article itself is focused on practical lessons you can draw from their research. The work relies on the standard self-reporting surveys to figure out networks, as always I’d be fascinated to see if automated data-mining techniques on email and phone usage within a company gives the same picture.

Why use email as an interface?

Photo by VoxPhoto

There’s some great examples out there of using email as the gateway to a service. I Want Sandy is a fantastic automated personal assistant that you drive entirely through email. You send emails containing natural language details of your events and lists, and you get back timely reminders and updates. Posterous lets you email files and documents directly to a website, with an incredibly streamlined interface.

So why do they use email as an interface, rather than the web?

Everybody can email. You don’t have to teach anyone a new web interface. You type in a mail, chose an address and hit send.

Mail programs make great content. You can easily attach files, add text styles and include photos. If I forget and hit Command-B in Firefox while I’m writing a blog post, my text doesn’t get bolded, I just get to see the bookmarks sidebar. Email programs get this right, they give you drag and drop, hot-keys and let you create good-looking documents easily.

Email is everywhere. Sure, most devices also have the web, but they usually have a much better UI for mail.

Email contains everything. Outlook is the center of most professional lives, and personal email already has most of the information, files and pictures you want to share. Being able to do interesting things with all of that without stepping outside of your mail service is really convenient. All of your history with any service is stored in the same place you keep everything else.

So how can you tap into that power? I don’t know what Sandy and Posterous are using, but GoodServer looks like an intriguing solution. It’s a Java library that implements an IMAP server that you can then plug your custom application logic into. They’ve got good documentation, a free evaluation copy, and it’s been battle-tested by a lot of commercial outfits.

Cross-platform Exchange connectivity with Moonrug

Photo by Melissa Morano

Thanks to a Gmail ad I recently discovered Moonrug Software. They offer a Java-based library that uses the MAPI network protocol to interface with any Exchange server. This is the same way that Outlook connects to Exchange, so it has the potential to support everything Outlook has access to, including calendar and contact information. This makes it a lot more comprehensive than basic email protocols like IMAP.

I’ve exchanged a few emails with Moonrug’s founder, and they’re still rolling out their full package, but they have recently released a sample demonstrating synchronization with Exchange. It’s good to see someone figuring out a cornerstone of the Exchange connectivity puzzle. Traditionally Microsoft has tried to maintain a competitive advantage by keeping it’s mail ecosystem as closed as possible. In theory that’s changing with the new Windows Open Protocols initiative. In practice they’ve not yet got around to releasing the really juicy details of things like the MAPI network protocol, so you’re stuck trying to reverse engineer them instead. Moonrug have been working on that approach for the last couple of years, long before the protocol initiative was announced.

Their product should be a great alternative to trying to do the same yourself, helping to open up the Exchange world to some real innovations.

How to fix illegal character errors in PHP XML parsing

Photo by Intimaj

I’m still plagued by occasional failures in my XML parsing due to illegal characters. Explicitly setting the character encoding reduced the frequency, but they’re still popping up occasionally. I have a couple of techniques I’ve tried. One is to use iconv() to strip out any illegal characters for the set I’m using, eg

$output = iconv("ISO-8859-1", "ISO-8859-1//IGNORE", $input);

This apparently works with more complex unicode sets, but at the moment I’m sticking with an 8 bit character encoding. The problem is that all values correspond to a defined character in ISO-8859-1. It took some head-scratching to realize that ISO-8859-1 is not the same as ISO 8859-1! The extra hyphen after ISO denotes an extended version that includes values in the range 0x00 to 0x1f, 0x7f and 0x80 to 0x9f. This fills up the range of mapped values, so that any number between 0 and 255 corresponds to a valid character in ISO-8859-1, and the line above does nothing.

So, in theory that will fix Unicode encodings, but I need something that will handle the characters that are valid in ISO-8859-1 but that aren’t allowed by the XML spec. These are the control characters in the range 0x00 to 0x1f, and 0x7f. To replace these you can run a regular expression that looks something like this:


I actually had a large file on disk that I wanted to change, so I actually used sed and its control character class shorthand:

sed ‘s/[[:cntrl:]]//g’ messages.xml > messages.xml.fixed

This solved the illegal character error I was hitting. Now I’m hitting "XML error: EntityRef: expecting ‘;’ at line 451837", and inspection of the text hasn’t helped me figure out what’s wrong yet. At least I’ve got a lot further through the file.

Even more ways to speed up IMAP Gmail importing in PHP

Photo by Zerega

In my last two articles on importing mail from Google in PHP I thought I’d got performance up to a pretty high level, but once I started testing with mailboxes with over 30,000 mails, I realized I had to be more creative.

The main trick I discovered in that investigation is using imap_fetch_overview() to get information on a lot of messages at once. This is a lot faster than grabbing the full header info for a single message at a time using imap_headerinfo(). The downside is that it doesn’t return as much information about each message. For me the most painful loss was that you only get the first recipient. Another wrinkle is that you don’t get the sender information separated into the email address and display portions, you just get a single string that may contain either both, or just the address. I had to write my own regex parser to pull out the two components.

I’ve updated my sample code to use the overview function, and it includes the code to split up the combined sender string too. You can try it online, or download it as evenfasterphpgmail.zip. The sender parsing code is also included below:

function extract_address_from_display($full)
    $matchcount = preg_match_all(
$full, $matches);
    if ($matchcount)
        $address = $matches[2][0];
        $display = $matches[1][0];
        $matchcount = preg_match_all(
$full, $matches);
        if ($matchcount)
            $address = $matches[0][0];
            $display = $address;
            $address = "";
            $display = $full;
    return array( "address" => $address, "display" => $display);

Welcome to the United States of America


I’ve just been accepted as a permanent resident here in the US, with the green card (actually mostly white) arriving a few days ago. It’s taken me 7 years of patience and struggle, but now I’ve graduated from a temporary work visa tied to a single employer, to an independent person, free to follow my dreams. It’s a giddy feeling, both the new-found security that I won’t have to leave the country and the liberation of having no restrictions on my professional life.

I’m counting down the days to naturalization now, just 5 years from now I can be a full citizen. I knew very quickly after arriving that I belonged here, as much as I miss my family and friends from Britain. America is full of encouragement for people dreaming big dreams, it’s the best place in the world for doing something that’s never been done. Thanks to everyone who’s kept me going through the long process of getting this sorted out, especially Liz.

How social networks control your company

Photo by Belinketeneghe

Brokerage and Closure by Ronald Burt is a must-read for anyone interested in innovation and social networks. He’s a sociologist with the Chicago Graduate School of Business who’s spent years mapping and analyzing the patterns of relationships in large companies like Raytheon. This book describes how new ideas, trust and power flow directly from these networks.

The title refers to the two forces that shape who you talk to. Closure is the technical term for how insular a group of peope are, measured by the strength of relationships between all the insiders, and the weakness of ties with outsiders. If you draw a graph of the communications within a group with high closure, you see a lot of lines between the members, and few contacts with others:
In everyday language, a cluster of people with high closure would be called a clique. They form because they have some big advantages. It’s a lot easier to trust someone you’ve no experience with if you share mutual friends, because the risk to their reputation will be severe if they let you down. The dense pattern of communications also makes sure that practices and beliefs get spread and standardized quickly throughout the group.

Large organizations are made up of many of these self-contained teams, each with their own shared experiences, ideas and ways of doing things. Brokerage is the act of bridging the gaps, or structural holes, between these groups in the network. People who have connections with multiple groups that would be otherwise unconnected are known as brokers or bridges.


They play an important role in innovation because they have the chance to introduce good ideas from one team into another, or combine partial insights from multiple groups into a new approach to a problem. They also have political advantages because they have more information about the motivations and goals of other teams, and can use that knowledge to help steer decision-making to avoid conflicts and gain support for initiatives.

Where Burt really shines is the application of this general model to the wealth of data from sociological studies within companies, together with his own personal experiences of working with large businesses. He sets out to prove 4 ‘stylized facts’ about how brokerage and closure works in practice:

Brokers do better. He uses network analysis together with personnel records to show that people who have strong connections outside their immediate team get paid more, and promoted faster.

Brokers have better ideas. Analyzing the ranking of improvements for a supply-chain management department together with the connectedness of the people suggesting them, he builds a case that the reason brokers do better is because of the quality of the ideas they come up with.

Brokerage is useless without closure. This is less of a slam-dunk, but he gathers evidence that brokers don’t help when the teams themselves are fragmented and poorly coordinated. Intuitively this makes sense, groups who can’t communicate internally won’t be able to execute even given the best ideas.

The echo chamber amplifies closure. Treating networks as information circuits ignores the primate biases that actually guide our social behavior. In particular, etiquette demands that we avoid contradicting a conversation partner when possible. This and similar habits mean that reputations are exaggerated in a feedback loop through gossip, since people you talk to will tend to agree with your assessment of someone, even if they don’t hold the same opinion. This gives the illusion of corroborating evidence for your views, and tends to tighten the bonds that bind a group together and more strongly exclude outsiders. This is a tough one to tease out from the data, but he shows that the more mutual contacts you share with someone, the stronger your opinion of them, even if that opinion disagrees with the assessments of your shared contacts.

This is vital reading for anyone dealing with social networks because of the applications of these theories to the design of our tools. At the start he talks about the delusion that having lots of contacts in a network adds value, when instead the really valuable connections are those outside your immediate group, and how this is where businesses like LinkedIn and Tacit should be focusing their efforts.

I’m particularly interested because most of my work has been aimed at making brokerage easier and faster. Defrag Connector was about establishing initial trust between conference attendees by revealing mutual friends. I’m analyzing email to reveal the existing communication networks, and identify good candidates for brokerage contacts because they’re experts in a helpful area, or have external contacts that would be useful. Most of his data comes from self-reported surveys of who people talk to, I’d love to run some of his work against my large company email data sets. He mentions Valdis Krebs in the foreword, but I was disappointed I didn’t see any references to his work deriving networks from implicit communication data.

Burt is writing for an academic audience, so he presents a lot of the primary data backing up his arguments, which can make it a tough read for generalists like me. He’s got a readable style though, and I love some of the anecdotes that pop up throughout, such as the quote from a manager explaining that when analyzing improvement ideas "that were either too local in nature, incomprehensible, vague or too whiny, I didn’t rate them."