Greg Berry just posted a very interesting comment, touching on a question I've wrestled with.
"…lots of business and life happens off the internet (hard to believe, I
know), but even within the digital confines, there are so many
different planes of communications to track."
Probably the best example of this is your significant other or business partner. If you're often in the same room as them, you probably won't send them as many emails as a direct report who's in another office. If you rely on communication frequency for measuring closeness, you'll underrate those relationships. So how do you work around this problem?
Design your algorithms around the blindspot. Google's search results are nowhere near as good as a dedicated human researcher could produce, but that doesn't matter. They narrow it down to a couple of dozen sites you can manually check. A few bogus results or dubious rankings don't matter because they can easily be spotted and ignored. The equivalent for tools based on automated relationship analysis is giving users the option to edit the strength of relationships to correct the occasional mistake, and always giving people a chance to eyeball any decision before any action is taken by the system.
Pick the right problem domain. I'm fascinated by applications in the business world because the relationships I needed the most help with are right in the sweet spot for email. I've sketched the graph above to show roughly the communication frequencies I've experienced. For different industries and generations the lines will shift and scale, but between Bob in accounting and your boss there's probably a lot of people you exchange a lot of mails with. Stick to problems related to those folks, and email frequency will be a good approximation to closeness.
Be realistic about the results. I think the Boulder Twits communication map is the best guide to the relationships in the local tech scene, but that's mostly because it's the only one. As Gregg says, different styles of communications heavily affect the results, even if you forget about the channels it's missing. Heavy Twitter users are far more likely to end up in the center of the graph than less prolific twits. Chris Wand is entirely missing because he's not on Twitter, even though he's heavily involved in the community. As we pull in more and more channels we'll be able to produce far better analysis, and do a lot of useful things, but we'll never capture all the fractal richness of relationships within our primate packs.