What can an externally-facing social network do for a business?

Haystack

The Haystack system from Cerado is a social network tool for businesses with a twist. The audience for the network is people outside the company who want to talk about something, and would like to find the right person within the business to approach. Cerado have published a short ebook explaining the uses of this, from sales to m&a, but for me the key insight is that there’s a big difference between using an anonymous email address or phone number as the initial way of contacting a business, and having a named person to talk to.

People don’t have relationships with organizations, they have relationships with individuals. Organizations don’t have any memory, ability to trade favors, or pictures of grandchildren to swap. In my professional life, I’m able to get information and assistance from large companies through the individuals I know who work at them. It’s rare that I’m able to get help through the public forums or mailing lists because the questions I’m stuck on tend to be complex and tricky. Since developer support is bombarded with questions from inexperienced developers, you have to go through the dance of proving that the machine is plugged in to the wall socket and caps lock is off before they’ll dig into your issue. With individuals, I have the credibility for them to assume I’ve done the obvious stuff, and cut to the chase of looking at the issue I’m presenting. They remember the context of the larger system I’m working in, so I don’t have to explain that every time. And since I’ve reciprocated and helped them in the past with issues related to my company, they’re willing to spend some time assisting me.

Haystack is about making it easier to build these sort of relationships with individuals inside companies, for sales contacts, business partnerships and anything else that requires communication with an organization. They publish profiles of employees, with photos, tags indicating their areas of expertise and contact details. From a customer perspective, I like this a lot more than the usual bland contact page, it adds a human face to the organization and lowers the barrier for me to contact them. I’d have more confidence I’d reach someone who’d be able and willing to answer my questions, rather than a random intern I’d have to persuade to escalate me. I could see this being useful if I wanted to buy from a company, get a job there, sell something to them, ask a technical question or talk about a business partnership.

It does require a change in the way most organizations operate though. The usual practice is that there’s designated people who are gatekeepers to the outside world, both to control the flow of information to make sure nothing outside of policy is leaked, and to protect employees from being distracted from their internal work. The gatekeepers also derive a lot of power and prestige from their control of the communication channels, so they tend to have a vested interest in keeping them limited. I think that organizations would be better off relaxing the current systems, but the spectre of lawsuits and information leakage makes it a tough sell in any group where avoiding blame is the top priority.

Cerado also run a great blog, another great way to add a human face to a company, and one that runs into the same worries.

Do you know about Microsoft’s social network for businesses?

Holdinghands

Last year Microsoft released a ‘technical preview’ of their Knowledge Network add-on for Sharepoint. It aims to solve two problems; finding other employees who can help me with a particular subject (expertise search) and locating colleagues who have contacts with someone I’m trying to reach (connection search). It works by analyzing email to identify both the social network, based on who emails whom, and figures out expertise by looking at the contents of emails.

Unfortunately their preview period is now closed, and you can’t download the beta version any more. I assume it’s been promoted to a feature in an upcoming Sharepoint release. It is very interesting to read the customer comments they release in their blog. "I thirst for more information, quality information, about the people in my company", "identifying
and accessing expertise within the organization and uncovering
connections across the supply chain are critical elements of
competitive advantage
", "
Knowledge in modern organizations isn’t just 80% undocumented, it’s 95% invisible". This all fits with my experience of working in large companies, and backs up my instinct that there’s real unsatisfied demand for solutions that offer ‘people search’ within an organization. It’s also extremely interesting to look through their solutions for privacy control, which boil down to some fine-grained control of what’s exposed to whom, a bit like Facebook’s privacy settings.

It’s also surprising that even Microsoft are relying heavily on a client-side component for this, which makes sense from a ‘get up and running quickly’ point of view, but is a massive barrier for adoption. I’ll be keeping a careful eye for any news on future developments.

How’s discovery different from search?

Telescope_2

A lot of the really interesting services out there are about discovery, rather than search. Since they’re both ways of finding content, it’s worth looking at what makes them different.

Search is goal-oriented, discovery is about the journey. It’s the difference between going to the hardware store to get a Phillips screwdriver, and browsing the travel section in a book store. Rather than having very specific criteria in mind for what you want, you’re using indirect clues to help you find something that will meet your general needs. For a travel book that might include whether you’ve seen it mentioned in a review, if you’ve enjoyed the author’s work before, if it has an attractive cover, if there’s praise from people you trust, if your hairdresser mentioned the location, if you’d seen a documentary on the area, or if it happened to be sticking out from the shelf a little more than the others.

Search is solitary, discovery is social. Most of the factors behind buying a travel book are about interactions you’ve had with other people. Digg is one example of trying to emulate some of those traditional mechanisms for finding popular items. Facebook’s addictiveness is all about being tapped into the pulse of your social circle, not with a particular goal in mind, but just to keep up with the context and doing the equivalent of picking fleas off each other’s backs (now that’s a Facebook App idea!). The power of me.dium comes from the injection of social context into browsing. It restores the cues we’re used to in the physical world, so we can judge locations by seeing where both our friends and strangers hang out.

Search is about universal answers, discovery is customized. Though Google talks about searching the sites your friends frequent, I think that functionality will be much more useful for discovery. It’s not likely that your social circle will be visiting the most authoritative sites on all the specific questions you’ll want to get definitive answers on, the sample size will just be too small. Instead, finding out which sites on general topics are popular amongst your circle would be a lot more interesting. Discovery is a lot more about your personal taste, and that’s something you likely share with your friends a lot more than the general population.

Discovery is a background task. Often you’ve got some general interests that you want to know more about, but you’re not actively taking steps to find out more. Instead you’re keeping an ear to the ground while you get on with other activities. This is where applying some social network techniques to the workplace can be really interesting. Seeing updates on what your colleagues are up to will often trigger some thoughts or connections on topics you’re interested in, and lead to discussions you wouldn’t have had otherwise. That could be the real killer app for social networks in the enterprise.

For more thoughts on this, there was a fascinating discussion over on the Lightspeed blog a few months back.

Thor, Dog of Thunder

Thor

A few weeks ago, me and Liz decided to dip our toes in the water of dog ownership, and temporarily foster a dog for a local rescue charity. As our friends gleefully predicted, we didn’t have the willpower to hand over our first case and ended permanently adopting him!

Thor is a 13 pound half Chihuahua, half maybe-Pinscher. We don’t know much of his history, but he’s about four years old, seems to have been well brought up judging from his behavior, and is very affectionate. Though I’ve been surrounded by cats for the last ten years, I grew up with dogs but this is Liz’s first exposure. She’s smitten, helped by Thor’s surprising fastidiousness, as the only dog I’ve met who avoids muddy puddles. He’s also a very strong hiker, still pulling at the end of a 5 mile loop with serious elevation gain.

I have no idea where his name came from, I’d love to know how a half-Chihuahua ended up named after a Norse god. To my mind he’s more of a Loki, but no point in giving him an identity crisis now. My parents are overjoyed, understandably they always preferred the dogs to us children, and now they have a furry grandson.

The only downside is the cats’ reaction, they are decidedly unimpressed by this new interloper. He’s very terrier-like, so we’ve had to train him not to chase the fun furry toys as they scamper away. Luckily they all seem to slowly be reaching an understanding, maybe in a few more weeks they’ll relax a bit more.

A couple of tips if you’re setting up a new Dell server

Dellscreenshot
If you’re buying from Dell, make sure you check the default configuration. I ended up with no CD or DVD drive, since I failed to spot that wasn’t included. Funnily enough, a basic DVD drive is actually no extra if you are more observant. Instead, I ended up making a quick trip to Best Buy and $50 poorer.

The default user name for a Windows Small Business Server installation is ‘Administrator’, with a capital A. Since there’s no name hints, I spent longer than you’d believe trying to figure that out. Ironically I had the password I set up carefully written down.

That, together with some Javascript DOM brainteasers, trailwork on the Garapito Trail, helping a friend with Dire Maul and a hike around Towsley, was my weekend!

A search cloud for this site


ajax  api  bho  c  crossloop  crossloop review  cruz island  dom  dom in  dom in c  error  example  facebook  facebook api  facebook footprints  find self  firefox  heat load dlls find  hiking santa  hiking santa cruz  hiking santa cruz island  how  how to  ie  ie dom  ie dom in  ie dom in c  in  in c  jolla valley  jolla valley campground  la jolla  la jolla valley  la jolla valley campground  managedq  outlook  outlook api  review  santa cruz  santa cruz island  search  server  socket  socket server  the  to  valley campground  wix  wix heat load dlls find  write 


After using statcounter to track around 800 visits, I’ve put together a search cloud for this blog.

I’m still getting a lot of hits for my early review of Crossloop, thanks largely to the success that Mrinal and the gang have had with their free screen-sharing tool. I’ve seen a lot of people looking for more information on Facebook programming, that’s shown pretty clearly in the cloud too. I get the feeling that most developers are still fairly cagey about sharing information, or just plain too busy to post the collection of tutorials and tips that usually grows up around a platform.

For the Outlook API, it’s almost the opposite problem. There’s loads of resources out there, many going back to the mid 90’s, but almost nothing that gives a broad introduction to all the different technologies you can use to work with Exchange and Outlook.

It’s been great to see my local camping guides reaching so many people too. It’s a real long-tail endeavour, but it’s a good feeling to know that anyone looking for camping spots in the Santa Monicas can now find out on the web, whereas before the information just wasn’t there.

As I gather more statistics, it’ll be interesting to do some per-page clouds. Since the whole blog covers a range of subjects, it’s hard to get an idea if you can get good hints on its meaning from this cloud. Looking at visits to particular pages more focused on a topic would be a better test.

An XML format for email

Mailarchitecture

To build a system that pulls information from large email stores, I need three processing stages. Capture to pull the information from the source, whether it’s using Exchange APIs to pull from a server, libgmailer or plain screen scraping. Analysis takes that data, and pulls out things like the social network and tags the content. Presentation takes the information that the analysis produces, and displays it to the end users in a compelling form.

Most of the innovation is going to be in the analysis and presentation, but getting the capture right, whilst not ground-breaking, will be a lot of code. I need to decouple the analysis implementation from the capture technology, so the same code could be used for both web mail and Exchange for example. That requires a common interchange format for the capture stage to output and the analysis to read. I want a human-readable, text-based format for easy debugging and implementation in a variety of languages, something that will be flexible enough to cope with a lot of changes in structure and that has a lot of existing tool support. Those all argue for something XML based. Luckily there’s already a draft email XML standard I can build on.

Unfortunately it’s looking like it never made it past the draft stage and now seems abandoned, but it’s a good starting point for me to use. RFC822 is the source of most of the tag names, so it’s an easy conversion from either raw message text or the MAPI functions. It only deals with individual messages, rather than large sets as I need, but it’s possible to logically extend it to have a hierarchical folder structure.

Now you can try ManagedQ for yourself

Explosion

My anonymous friends over at ManagedQ have left their private beta and opened their search service to everyone. I already covered how helpful their regular expression in-page searching can be, and they have a lot more to offer too, like their entity extraction and the most accurate thumbnails I’ve seen. You can see more reviews on AltSearchEngines and thenextweb.com.

I’ve been having some fun using the regular expressions I posted a few days ago with ManagedQ. To see their power, follow these steps:

1) Go to managedq.com and enter your main search terms (eg pete warden)
2) On the results page, start typing to bring up the inpage search box
3) Delete anything that’s already in there and enter the following regular expression:
/([0-9]{3})[^0-9a-z]*([0-9]{3})[^0-9a-z]*([0-9]{4})/

This should highlight any phone numbers in the results pages. I made the expression a bit more restrictive than my previous version to exclude letters as phone number seperators.

Managedqnumbers

Why is web search so popular and mail mining so rare?

Acorns
Looked at from a high level, they both take unstructured data and try to understand its meaning. A big practical difference is that web search tools are designed for the masses to use, whereas email mining is only used by a small number of professionals either doing litigation discovery or business intelligence work. Why is this?

There’s no obvious painful problem. With web search, the problem is "I need to find authoritative information on X". With mail, the question is more like "I need to find the discussion I was involved in on X", which can be solved locally by searching your inbox. This doesn’t need mining, just a search on your drive or personal webmail repository.

Email is private. Whilst technically your work email belongs to the company and they’re free to do whatever they like with it, a lot of people have sensitive personal infomation or discussions over their work account. Even leaving aside the ethical issues, you won’t get adoption unless employees feel comfortable about their privacy. A mass-use mining system needs to have privacy policies built-in from the start, which is a tricky balancing act because you also want to make as much available as possible.

Messages have no hyperlinks to each other.
PageRank works because there’s a network of links between web pages. The closest equivalent to this for mail is the graph of who emails whom, and how often and quickly an email is replied to or forwarded. This is still a research topic though, it’s not a widely used or understood metric.

This all sounds fairly downbeat, but what really excites me is that I think there are plenty of painful problems that can be solved with mail mining (eg find an expert, find contacts, collaboration), they’re just not as obvious. There’s a lot of smart ideas on web search that can be applied to mail too. I also think there’s some big advantages to email.

You know who your users are. Inside a company something like Active Directory gives you a wealth of information about who everyone is, what their formal relationship is, and allows you to easily authenticate identity to control access. The web is struggling towards this, but it’s still a long way off. Even for people outside the company, an email address is a good proxy for identity and usually comes with an alternate readable name too. Knowing about your users ahead of time also opens the door to doing a lot of pre-processing before they even try the service, so you can present them with useful information immediately, for example pre-building their social graph.

Time. Another great feature of email is that you’ve got data from a whole range of time, not just a snapshot of how the content looks right now. This opens up the door to a lot of time-based analysis techniques, such as measuring how metrics change over a year. The web has the wayback machine, which is an amazing feat but still a long way from the depth of mail.

See what Google thinks your site is about with a search cloud

Searchcloud
If you want to know which search terms are most likely to find your site, I’ve uploaded a PHP library that creates search clouds from your logs. To use it include searchcloud.php and call create_search_cloud(), passing in the location of your log file, the name of your site, the number of tags to produce and the min/max font sizes in percentages. You’ll be returned a string containing the HTML for the cloud. Here’s an example:

echo create_search_cloud("visitlogs_petewarden.txt", "petewarden.com", 50, 50, 250);

You can see it working on this example page based on statistics from my old open-source image processing site, which I’ve also included with the library for testing purposes.

Based on the examples I’ve tried, my hypothesis that the most frequent search terms are a good approximation for the meaning of the site holds up. If you take the top 8 terms from the petewarden.com cloud, you get "after effects", "plugins", "effects", "after", "how to", "install". "how to install", "petes plugins". 4 of them would be good tags or taxonomy categories for the content, and on inspection the use of more sophisticated rejection of duplicates and stop words would help increase that ratio. I’ll be interested to hear how this works on some of your sites.