Lots of interesting mail/social graph buzz

As Brad says, it is pretty obvious once you connect the dots, but I was still interested to see the NY Times article about the big players looking at their email services, and figuring out they’re not that far from having their own social networks.

It was good to learn about a new site covering this thanks to the comments section of Feld Thoughts; Email Dashboard. I’ll also need to write up what I learnt about Trampoline Systems and ClearContext at Defrag, but that’s for another post.

Defrag: Visualizing social media: principles and practice

Matthew Hurst, from Microsoft, gave the second Defrag talk on the topic of visualizing social media. He described JC Herz’s first talk as complementary to his, covering some of the same problems, but from a different angle. He started by laying out his basic thesis. Visualization is so useful because it’s a powerful way to present context to individual data points. It ties into the theme of the conference because while web 1.0 was a very linear experience, flicking through pages in some order, 2.0 is far more non-linear, and visualizations can help people understand the data they now have to deal with through placing it in a rich context.

He then ran through a series of examples, starting with the same blog map that he’d created, and JC had used as a negative example in her talk. He explained the context and significance of the images, as well as the fact they were stills from a dynamic system, but did agree that in general these network visualizations have too much data. He introduced a small ‘Homer’ icon that he added to any example that produced an ‘mmmm, shiny, pretty pictures’ reaction in most people, without necessarily communicating any useful information.

The next example was a graph of the blogosphere traffic on the Gonzales story, generated by BuzzMetrics. This was a good demonstration of how useful time can be in a visualization. After that came an impressive interlocked graph, which after giving the audience a few seconds to oh and ah over, he introduced as a piece of 70’s string art! A pure Homer-pleaser, with no information content.

The next picture was a visualization of the changes in Wikipedia’s evolution article over time. This was really useful image, because you could see structures and patterns emerge in the editing that would be tough to see any other way. There’d been an edit war over the definition of evolution, and the picture made it clear exactly how the battle had been waged.

TwitterVision got a lot of attention, but isn’t much use for anything. It gives you information in a fun and compelling way, but unfortunately it’s not information that will lead you to take any action. To sum up the point of showing these visualizations, he wanted to get across that there’s a lot of techniques beyond network graphs.

He moved on to answering the question "What is visualization?". His reply is that the goal of visualization is insight, not graphics. Visualizations should answer questions we didn’t know we had. He returned to the blogosphere map example, to defend it in more detail. He explained how once you knew the context, the placement and linkages between the technology and political parts of the blogosphere were revealed as very important and influential, and how the density of the political blogosphere revealed the passion and importance of blogs on politics.

(Incidentally, this discussion about whether a visualization makes sense at first glance reminds me of the parallel endless arguments about whether a user interface is intuitive. A designer quote I’ve had beaten into me is ‘All interfaces are learnt, even the nipple’. The same goes for visualization, there always has to be some labelling, explanation, familiarity with the metaphors used and understanding of the real-world situation it represents to make sense of a picture. Maps are a visualization we all take for granted as immediately obvious, but basing maps on absolute measurements rather than travel time or symbolic and relative importance isn’t something most cultures in history would immediately understand.)

He also talked about some to Tufte’s principles, such as "Above all else, show the data". He laid out his own definition of the term visualization; it’s the projection of data for some purpose and some audience. There was a quick demonstration of some of the ‘hardware’ that people possess for image processing that visualizations can take advantage of. A quick display of two slides, containing a scattering of identical squares, but one with a single small circle in place of a square, shows how quickly our brains can spot some differences using pre-attentive visual processing.

A good question to ask before embarking on a visualization is whether a plain text list will accomplish the same job, since that can be both a lot simpler to create, and easier to understand if you just need to order your data in a single dimension. As a demonstration, he showed a comparison of a table listing the ordering of 9/11 terrorists in their social network based on four different ranking measures, such as closeness, and then presented a graph that made things a lot cleared.

He has prepared a formal model for the visualization process, with the following stages:

  • Phenomenon. Something that’s happening in the real world, which for our purposes includes out on the internet.
  • Acquisition. The use of some sensor to capture data about that activity.
  • Model/Storage. Placing that data in some accessible structure.
  • Preparation. Selection and organization of the data into some form.
  • Rendering. Taking that data, and displaying it in a visual way.
  • Interaction. The adjustment and exploration of different render settings, and easy other changes that can be made to view the data differently.

There’s actually a cycle between the last three stages, where you refine and explore the possible visualizations by going back to the preparation to draw out more information from the data after you’ve done a round of understanding more about it by rendering. You’re iteratively asking questions of the data, and hoping to get interesting answers, and the iteration’s goal is finding the right questions to ask your data.

Web 2.0 makes visualizations a lot easier, since it’s a lot more dynamic than the static html that typified 1.0, but why is it so important? Swivel preview is a great example of what can be done once you’ve got data and visualizations out in front of a lot of eyes, as a social experience. The key separation that’s starting to happen is the distinction between algorithmic inference, where the underlying systems make decisions about importance and relationships of data to boil it down into a simple form, and visual inference, where more information is exposed to the user and they do more mental processing on it themselves. (This reminded me of one of the themes I think is crucial in search, the separation of the underlying index data and the presentation of it through the UI. I wish that we could see more innovative search UIs than the DOS-style text list of results in page-rank order, but I think Google is doing a good job of fetching the underlying data. What’s blocking innovation at the moment is that in order to try a new UI, you have to also try to catch up with Google’s massive head-start in indexing. That’s why I tried to reuse Google’s indexing with a different UI through Google Hot Keys.)

One question that came up was why search is so linear? Matt believes this can be laid squarely at the door of advertising, there’s a very strong incentive for search engines to keep people looking through the ads.

Defrag: Web 2.0 goes to work


Rod Smith, the IBM VP for emerging technology, had a lot to squeeze into a short time. I had trouble keeping my notes up with his pace, and I wish I had more time to look at his slides. They often seemed to have more in-depth information on the subjects he described, I will contact him and see if they’re available online anywhere. (Edit- Rod sent them on, thanks! Download defrag_keynote.pdf

They are well worth looking through.)

He started off by outlining his mission in this presentation. He wanted to talk about the nuts-and-bolts issues of the technology behind 2.0, and why so many businesses are interested in it. The first question was why 2.0 apps are produced so much quicker than traditional enterprise tools?

Part of the reason is that they tend to be a lot simpler, and more focused on solving a specialized problem for a small number of people, rather than tackling a general need for a wide audience. Being built on the network, they are naturally more collaborative, and support richer interactions between people. They also tend to be built around either open or de facto standards. Because they are comparatively light-weight, they can be altered to respond to change a lot more easily too.

DIY or shadow IT, technology developed outside of the IT department, has always been around. Business unit people have been writing applications as Excel macros for a long time. (On a personal note, Liz is an actuary with a large health insurer, and she’s been creating complex VBA and SAS applications for many years as part of her job.) What 2.0 brings to the table is a lot of interesting ways to link these isolated projects together, for example by outputting to an RSS feed, which can then be routed around the company. People in business units are now a lot more tech savvy than they used to be, which also really helps the adoption of these tools.

He moved on to talk about the practicalities of creating "five minute applications" or mashups. The biggest hurdle always seems to be how to get easy access to the data? "I have all this data from years of doing business, how do I unlock it?"

As an example, he looked at how StrikeIron had created a location-based mashup of Dun and Bradstreet’s business information service, for establishing the legitimacy of a company you’re dealing with, or finding likely sales prospects. (I saw a screenshot of an actual map display, rather than a text summary, but I can’t locate that.)

Old companies have accumulated a lot of potentially very useful and valuable data, but there’s not much use being made of most of it. The question, as above, is how to make that data mashable. The term often used for this part of the process is ‘widget composition’, which covers a lot of different technologies, from Google gadgets to TypePad widgets.

There are of course some dangers with the brave new world of Web 2.0 in business. One of the strengths of traditional IT is that there’s accountability and responsibility for ensuring service availability and data accuracy. If a service created by a business unit member becomes widely popular, should they be the ones to maintain and update it, or is there a process to transfer that to IT? There’s little visibility from the CIO and IT manager level as to what’s going on with these shadow IT projects. It’s like the early days of internal web servers being installed across companies in an ad hoc way, we’re only just sorting out the tangle that resulted from that. There’s also some unique issues with digital rights management and copyright once you’re sending data through feeds. It’s not so much like music DRM where the problem is malicious actors trying to steal, as just allowing people to keep track of what the right attribution and correct uses of the data are.

Copyright.com has done some interesting work in this area, creating meta-tags to attach to data that allows automatic handling based on rules for different attributions.

Defrag: How taxonomies meet folksonomies; or the role of semantics on the web


Karen Schneider gave a talk drawing on the centuries of experience that the library community has in classifying and organizing information, and the relationship between those formal taxonomies and tagging approaches like de.icio.us. She’s made her slides available here.

She started by laying out the relevance of libraries, with a look at community college students library usage. They’re checking out roughly the same number of books as a decade ago, but they now check out many ebooks too, as well as accessing databases, so overall usage has actually increased. Community college students are 49% of the total undergraduate population in the US, and they’re poorer and work longer hours outside of college than average. They’re a very demanding audience, so their heavy usage demonstrates that libraries are providing an efficient and useful service.

She then tackled a few librarian stereotypes. I’ve been a library-blog lurker for years, so I didn’t need any convincing, but she got some laughs with Donna Reed the spinster librarian from the alternate world in Its a Wonderful Life. I was disappointed that Giles was missed out, but you can’t have everything.

The next point was a quick demonstration of some typical library software, and how awful it was. The presentation was essentially the same as a card catalog, very static and uninvolving. Doc Searls had talked the day before about data in general being trapped in disconnected silos, in hard-to-use formats, and library systems suffer from the exact same problems.

WorldCat is a universal library catalog that lets you find books and other items in libraries near you. It’s still based on the old card-index model of marked (edit- MARC, that makes more sense, thanks Karen!) data, but it’s a big step forward because it’s linking together a lot of different libraries’ data sources.

There’s some issues that traditional taxonomies have been wrestling with for a long time, that are also problems for the newer technologies. Authority control is the process of sorting out terms which could be ambiguous, for example by adding a date or other suffix to a name to make it clear which person is referred to. Misspelling is another area that librarians have spent a lot of time developing methods to cope with. Stemming is problematic enough in English, but she discussed Eastern European languages that have even tougher word constructions. Synonyms are another obstacle to finding the results you need. She showed a del.icio.us example where the tags covering the same wireless networking technology included "wifi", "wi-fi", "802.11", "802.11b". Phrase searching is something that library data services have been handling for a lot longer than search engines. And finally, libraries have been around for long enough that anachronisms have become an issue, something that tagging systems have not had to cope with. Until the 90’s, the Library of Congress resisted changing any of its authoritative terms, such as afro-american or water closet, even though they’d become seriously out-dated.

Disambiguation or authority control is something that taxonomies are very good at. The creators of the system spot clashes, and figure out a resolution to them. Worldcat Identities is a good example of the power of this approach. Interestingly, Wikipedia is very good at this too, as the disambiguation page for ‘apple’ shows. She believes this is the result of a very strict, well-patrolled community where the naming is held to be extremely important, and believes the value of the naming is under-appreciated.

Another strong point of traditional cataloging approaches is the definitions. Wikipedia seems to have informally developed a convention where the first paragraph of any entry is actually a definition too.

Having somebody with in-depth expertise and authority on a subject do a centralized classification can be extremely efficient. She gave the example of a law library in California that as an excellent del.icio.us tagging scheme, but I couldn’t find the reference unfortunately. (edit- Here it is, from the Witkin State Law Library ).

Library catalogs have an excellent topic scheme, they have a good hierarchy for organizing their classifications, which is still something that folksonomies are  trying to catch up with, using ideas like facets.

These are all areas where folksonomies can learn from taxonomies, but there’s plenty of ideas that should flow the other way too. One of the strengths of tagging is that it’s really easy to understand how to create and search with tags. The same can’t be said for the Dewey decimal system. Cataloging in a library involves following a very intimidating series of restrictions. Tagging doesn’t frighten off your workforce like this.

In the short term, tagging is satisficing, and trumps the ‘perfection’ of a traditional taxonomy. In 2006, the Library of Congress was proud to report they’d cataloged 350,000 items with 400 catalogers. That works out to only about 3.5 records per day!

Tagging is also about more than just description. It’s a method for discovery and rediscovery, use and reuse, with your own and other peoples bookmarks. Folksonomies produce good meta-data; some people seem concerned that 90% of flickr photos fall within six facets, but this is actually a good reflection of the real world.

It seems like library conferences are a lot more advanced than defrag in handling tags, since there’s a formal declaration of a tag for each event in advance, and then that’s used by everybody involved. There wasn’t anything this well-publicized for Defrag, and the one chosen, ‘defragcon’, caused a sad shake of the head, since that’s not future-proof for next years conference, and we’ll end up having to change our tags to add a year suffix.

She also brought up the point that basic cataloging and classification techniques seem to be instinctive, and not restricted to a highly-trained elite of catalog librarians. We all tend to pick around four terms to classify items.

There is a common, but useless critique of folksonomies; that personal tags pollute them. This is a useless criticism because it’s easy for systems to filter them out. A more real problem is the proliferation of tags over time, which ends up cluttering up any results. There’s also the tricky balance between splitters and lumpers, where too finely divided categories give ‘onesie’ results where every item is unique, or overly broad classes where the signal of the results you want are overwhelmed by the noise of irrelevant items.

There’s some examples of ‘uber-folksonomies’, which take the raw power of distributed classification, and apply a layer of hierarchy on top. Wikipedia is the best-known example, and its greatest strength is how well-patrolled the system is. LibraryThing is a system that lets you enter and tag all the books in your personal library. The Danbury library actually uses the information people have entered in LT to recommend books for their patrons who search online, as well as using pre-vetted tags to indicate the categories each book belongs to. The Librarians Internet Index is another well-patrolled classification system for websites (though it looked a surprisingly sparse when I checked it out). The Assumption College for Sisters has been using del.icio.us to classify its library. Karen pointed out that it’s hard to imagine anyone more trustworthy than a nun librarian! Thunder Bay Public Library has also been busy on del.icio.us.

A deep lesson from the success of folksonomies is that great things can be achieved if people want to get involved. We need to incentivize that activity, and she used the phrase ‘handprints and mirrors’. She didn’t expand on the mirrors part, but I took that to mean the enjoyment people took from looking at a reflection of themselves in their work. We all want to feel like we’ve left some kind of handprint on society, so any folk-based system should reflect that desire too.

She only took one question, asking how libraries are doing? She replied that they’re an example of the only great non-commercial third-space. She also gave examples about how people like to be in that space when they’re dealing with information, even if they’re not there for the books.

Defrag: Visualizations of Social Media


JC Herz’s talk was originally "Visualization of Social Intelligence", but she felt that "Visualizations of Social Media" was a better description of what she was actually discussing. She started off in an iconoclastic vein, comparing current social visualizations to lava-lamps or snow globes; fascinating, but conveying absolutely no information. Pretty but useless.

To find a better way, she took us on a whistle-stop tour of the history of graphical data, or pictures that tell stories. Snow’s cholera map and Guerry’s suicide map of France were two classic examples. One common theme with all the classic visualizations is that they were snappy graphics for hidebound institutions. Because producing them involved a lot of manual effort and time, they were only created to address pressing problems.

She then moved on to an example of some work she’d done in this area. She had a brief to investigate how you can visualize coordination in a group, using data from the America’s Army game. She created a way of viewing a particular team’s communications over time, with a 3D graph showing communications as circles on a line, with a line for each team member and the distance along the line representing time. Using this tool, she was able to draw out statistical measures that could tell which teams were effective, based purely on their communication patterns.

One thing she thought important to emphasize is the value of time as a parameter in most visualizations. Current social graphs have no awareness of time.

Another important component of a successful visualization is that it should have consequences. She took the example of the study of dating patterns in a mid-west high-school, and asked if anyone would have been interested in the same graph showing SMS patterns in that high-school? Another example that speaks to the same question is a map showing what name people in different parts of the country use for pop or soda. There isn’t a big consequence to this map, but the minor light it sheds on the culture of the US is enough to get people interested.

The Trulia Hindsight visualization combines both something people are very interested in, real estate, and shows it over time. It tells a story, in a very compelling way.

Space is an important dimension for telling stories too, which is why maps showing political fundraising by area are so fascinating. Conflict or drama is the third ingredient to a compelling story, which is why the diagram showing book buying patterns by party affiliations was such a success.

A story isn’t just an automatic result of running data through an algorithm, to get insight, you have to engage in a dialog with the data. If you ask stupid questions, you get stupid answers.

To wrap up, she proposed some principles about what makes a visualization useful. There should be less information, but the right information. Not just a mish-mash of all the data you have, but a focused version that shows a selected subset of interesting or surprising information. All visualizations should tell a story, which requires notions of time, space, and something at stake. This is why so many popular visualizations are political, because people are trying to make important decisions. To be useful, the visualization also has to be sharable. You’re trying to tell a story to affect something in the real world, and the only way of affecting things is by getting other people involved.

A good test for whether a visualization is any good is asking if it has any consequences? It’s such a waste to go to all the effort of producing a diagram if it doesn’t matter. Any artifact you produce must be sharable to be effective.

The first question came from Matt Hurst, the author of the first diagram JC used in her rogues gallery of pretty but useless visualizations! He’s got an online response here too. He wondered if the America’s Army diagrams were any more intelligible? He also brought up the survivorship bias problem; you can’t know if you’ll get something compelling out of your data set before you start attacking it. You never know which question you ask of it will produce something compelling or surprising.

JC agreed with this, and thought the answer was to emphasize the analysis stage, rather than skipping it.

A lady, whose name I didn’t catch, (edit- Marti Hearst, the Berkeley professor, who was in the "Next level discovery" panel, thanks Matt and apologies Marti, you were out of line of sight for me) said that it’s very hard to make visualizations. The reason that the Amazon political preferences example is still used, years after it was created, is that it’s tough to create something that compelling, and it also needs some luck. She agreed with Matt that the army example was confusing.

JC’s response was that you do need a lot of domain knowledge to be effective.

Another point the same lady Marti brought up was that in the high school example, SMS patterns could be incredibly important if it was a forensic investigation after a school shooting.

As support for this idea, JC brought up the example of the correlation between Snickers sales in prison commissaries, and riots. It’s apparently the best method of predicting riots, much better than more obvious metrics like violent incidents, because the inmates know something is brewing, and want to stock up on food.

Mershad Setayesh from Collective Intellect said that visualizations only make sense if you can see a pattern. How do you do that? Is there a methodology to get patterns from your data?

JC suggested various methods, applying calculus, and bundling points up using natural analysis, to find things like cliques.

Defrag: Next generation disruptive technology open space


Despite the extra talk on OpenSocial there was still some time for open space discussions. The three topics were "OpenSocial vs ClosedPrivate" led by Kevin Marks, "What is Defrag?" with Jerry Michalski and "What are the next generation disruptive technologies?", proposed by Jeff Clavier. I chose to go to the last one, since I was interested to hear what people were expecting to see. People were throwing out interesting ideas and challenges incredibly fast, and I’ve tried to capture them here.

Matthew Hurst of Microsoft kicked off the discussion with his concern that lack of trust that a service will be there next week was hindering the adoption of new technologies. He doesn’t want us to be focusing on changing people’s behavior, he thinks the future will be in simpler clients.

David Cohen would just like his identity to be portable across all the systems. Matt was concerned that this would make information leakage a much bigger problem.

Jeff’s big question was about what the friend relationship means once you start doing that? The relationship is all about context, and your myspace friends won’t be the same as your linkedin contacts. There’s also no concept of strong ties versus weak ties.

Seth Levine wanted to know if anyone liked Facebook’s self-reporting mechanism, and gave the example of an acquaintance who marked his friend request with "We hooked up", not realizing the real meaning of the term!

I suggested email analysis as an approach to a better understanding of relationships. Kevin from the Land Grant Universities research access initiative was sceptical that his gmail inbox reflected his strong ties. I countered that I thought that was true for our general life, but that most people’s work inbox’s did correspond fairly well with those ties, since email is still the primary communication tool in many companies.

Ben from Trampoline Systems injected a note of skepticism into the discussion, noting that no weighting information could be added without support from the platforms that own the social network, like Facebook and Linkedin, so it’s futile to discuss it before that’s a reality. A lady (whose name I didn’t catch) pointed out one of the major things missing from networks is the recognition that relationship’s change over time.

Clarence Wooten, CEO of CollectiveX, was concerned that friend overload was becoming as bad as RSS. With so much undifferentiated data coming in, with the same mechanism for both childhood friends and chance acquaintances, it was becoming less useful. Some of the other folks proposed technical solutions  for the RSS overload problem, including a couple sold by their own companies.

David Kottenkop(sp?) of Oracle laid down the challenge that computers should be able to figure out the ranking and classification of relationships based on communication patterns. As an editorial comment, this is something I’m convinced is the future, and is the idea behind a lot of what I’ve been working on.

Craig Huizenga of HiDat thought there were basically two options for organizing all this data; direct formal attributes or tagging, either something that’s tricky but has semantic meaning, or something more informal but easier to use. Across the table, someone suggested that what we really needed was another level of hierarchy for tags, so you could tag tags themselves.

Matt Hurst was concerned that a lot of the systems we were working on were susceptible to exploitation by users once they figured them out. For example, tagging is now being used to spoof search engines. The same thing is true for identities, people have very different behaviors on MySpace and LinkedIn, which friends they’ll accept, and what information they’ll reveal. Just as Google has trained us all to use keywords in search, we’ve been trained into certain behaviors by these services. An interesting practical example of a world where universal identities are starting to appear is message boards. Since there’s only a few different forum systems, it’s technically practical to hand-code interfaces to all of them, and allow users to avoid the hassle of repeated manual registrations with different sites.

Doruk pointed out that the social networks had to solve the problem of weighting friends, or die because they become useless. One term thrown around for this was relationship management.

Ben pointed out that we’re a really skewed sample to be talking about this stuff. MySpace is still massive, we’re not the mainstream users. There’s a generation gap here, where we’ve got very different perceptions than the kids of today. Matt mentioned as an example that Myspace users appreciate seeing ads on a site, because they know that means that it’s a free service, and they won’t be ambushed with any charges.

He wanted to know where the monetization of future services would come from? He doesn’t want more ad-led services, he wants to know where the painful problems are, that people will pay us money to solve?

This led me to ask whether the enterprise was the answere? Was that where we could still sell services? Matt’s answered with a maybe, since there’s no possibility of ads there, and there’s mechanisms to force people to use services if someone on the hierarchy decides it’s necessary.

Seth suggested that we all needed to bring our high-falutin’ visions down to something real and concrete. He described his first experience of working in a large corporation, and being astonished to discover that everyone left at 5pm. 90% of people want to go home, not wrestle with new technology in the hope it will eventually make them more productive. He thought NewsGator was a great example of a company taking our fancy technology, and turning it into solutions for everyday problems.

Ben agreed that no one wants to adopt this stuff in a corporation. Matt suggested that the only way forward was to throw some of this technology into firms, and see how people creatively decide to use it.

Beth Jefferson of BiblioCommons jumped in, suggesting that these technoogies brought up a lot of tricky issues. People who aren’t friends-of-friends with a decent number of people end up isolated if there’s heavy use of social networks, you’re just magnifying the effects of cliques. She gave a great quote; "Search is a representative democracy with unfair elections". The same goes for blog postings, we pretend to egalitarian principles, but know that there’s a core oligarchy of highly-influential bloggers.

Jeff brought the discussion back to first principles by asking what the world really needs? Craig suggested a way to deal with the overload of information. Jeff thought that turning off your computer for a week, and seeing what you missed, was a good start.

Matt’s concern was how flat data was, and the lack of tools to deal with it in a meaningful way. Seth suggested sorting out some data formats, so we can visualize all this information. Christian of the CAF advisory council suggested that finding experts from amongst our circle of friends was a great unsolved problem.

Greg Cohn of Yahoo gave probably the best comment of the session when he suggested that the biggest problem that faces the world is the lack of clean water for billions of people, not information overload. This led to a really interesting discussion in the session and afterwards about how we change policy to solve these real, people-are-dying, problems. I’m going to need a post on its own to justify, but it’s something that’s I have trouble forgetting; as a community we’re incredibly lucky, and are we really giving back enough to the rest of the world that’s trapped in poverty?

Defrag: OpenSocial vs ClosedPrivate


In response to the looming threat of ClosedPrivate, Kevin Marks of Google dropped in for a surprise talk on OpenSocial. He’s part of the NSA (Not Search or Apps) team, and wanted to give us an idea of what OS is all about. He started off by quickly running through the campfire presentation, and stressing that the aim was a common social api across many sites. The goal is to bring a social context to applications, to personalize them based on social graphs. He wants to bring media, filtered by the apps logic, so you can see what your friends are reading and you might be interested in.

Talking about the problems of bad actors in social networks, he quoted Douglas Adams; "Of course you can’t ‘trust’ what people tell you on the web anymore than you can ‘trust’ what people tell you on megaphones, postcards or in restaurants. Working out the social politics of who you can trust and why is, quite literally, what a very large part of our brain has evolved to do."

There was an API overview, which Eric hurried him through to get to the meaty non-technical discussion. The first question was about where OpenSocial came from? Kevin’s answer was that it came from two sources; the desire to easily add features to Orkut without having the pain of changing server code, and being inspired by what was possible in Google Gadgets.

The next was how mature Kevin thought the API was? His answer was that you can do things with it, but only just!

Another tough question was about how the security model worked? Kevin replied that this was currently defined by the container, but agreed this was non-ideal. He explained the dilemma Google has with sites asking users for their mail names and passwords, it’s a big security headache. The only solution going forward is to make sure that the secure method is easier to use than the unsecure, but it’s not clear how this will be done.

When asked about possible container services, such as message sending or common UI elements, Kevin thought they’d be a nice feature, but was noncommittal.

One of the audience wondered why any other social networks would want to sign up for OpenSocial? His reply was that supporting it would make it easier for users to get interesting features.

He was asked if the friends model extended IM, and he thought it was simplistic enough at the moment to map. He also suggested avoiding an email address as a primary key, since most people have multiple email addresses. When asked about adding friends through the API, he replied that it was just a query mechanism on top of the other networks, since that was a lot easier to figure out. He agreed that security was an even bigger issue than normal, since you’re giving access to your friend’s personal information to any malicious code too.

The question came up of what objections he’d had to overcome from the social networks, and whether fears of their whole graphs being downloaded were a problem? The biggest problem they’d run into was the user id namespaces filling up. On that topic, he suggested an important use might be the delegation of user registration and authentication to a third party social network, for services that don’t want to implement that infrastructure.

As a final point, Ross Mayfield brought up the question of the possibility of malicious containers made the problem of bad actors an order of magnitude worse?

Defrag: Next level discovery panel


Brad Horowitz of Yahoo led a panel that was looking at the topic of "Next level discovery". Marti Hearst, from UC Berkeley had ‘cheated’ and prepared some slides, but they all set the stage for the following discussion.

She started off with an interesting distinction between the current search model of ‘navigating’, where you incrementally step towards the answer you’re looking for. The real world analogy she uses was orienteering, where you’re reading a map and compass to move between landmarks. What she’s hoping for in the future is ‘say what you want’, where you teleport instantly to the result you want.

Her hope was that a couple of trends were coming together to make this vision a reality. There’s now massive collections of user information that are being analyzed to produce better algorithms than PageRank to match results to particular searches. She singled out Ryen White’s paper on using popular destinations to predict the search results people wanted as an example of the way things would move in this direction. She also described Ebay Express’s UI as a practical service using natural language processing, with the example of searching for reebok womens shoes, and getting back a page with useful categories listed based on an understanding of the terms.

Lou Paglia of Factiva was the next speaker up. He describe Factiva’s work as taking news sources from all over the world, and normalizing them into a searchable form. He felt that search and discovery were the same thing, just different aspects of the same spectrum. His big wish was for a real semantic web.

Jeremie Miller, creator of Jabber and currently with Wikia, laid out his view of the way the web was organized. At the moment, as Doc Searls and others discussed yesterday, information is held in walled-off corporate silos, there’s few truly open sources of data. To look into the future, it’s useful to look at the history of the internet. The first level was heterogeneous networks connected by common protocols like TCP/IP and email. Second was the web, also connected by open protocols. He’s not sure what the third level will look like, but my inference was that he was betting on open protocols being an important part. As an editorial remark, I’m not so sure that the second level was as open as the first. You can definitely use open protocols as a client to access servers, but there’s few server-to-server open protocols in the web world, unlike the pre-web universe of usenet and email. That’s one reason I’m hoping client-side approaches will help return some of what we lost in interoperability.

Next was Steve Larsen of Krugle. His argument was that there was a big difference between PageRank search and indexing. Indexing is like Yahoo’s original directory, where there’s some knowledge of what terms mean, and is tied to the idea of the semantic web. He used the analogy of network news compared to cable in the early 90’s, with vertical search engines as the small, focused competitors that were able to eat their established rivals’ lunches. The example he threw out was that Krugle always knows that python is a language, not a reptile.

Brad finished the introductions by talking about his desire to ban the word ‘user’, since it was so cold and technical. We need to treat our customers as people, give them the tools to participate in cataloging the world. His example was Flickr’s interestingness algorithm, that uses existing data about people’s interactions with the site to figure out a good approximation of photo’s popularity. Like Marti, he’s betting that looking at people’s behavior is going to be the next big advance in search technology.

The first question was "Does search suck?". Steve thought it was worse than it had been. Marti thought that the reliance on keywords made internet searching suck. Lou had a more fine-grained answer, that searching for detailed, specialized information was really hard, but that it worked quite well for general questions. The other problem in the enterprise is the hidden, unsearchable web.

Brad’s answer drew a lot on his experiences with Yahoo Answers. In Korea, there was a real shortage of native-language results for a lot of queries, and so the service became very popular, as either knowledgeable people, or those with the language skills to translate foreign results could supply the content as needed. He described what emerged as a kind of blogging-on-demand.

Brad then threw a hardball to Lou, and asked him whether there was a future in paying for search services. Lou admitted this was a hot topic around the water cooler at Factiva, and that going free was always one of the options for the future, though their business model was working well at the moment.

Next, the discussion turned to Krugle’s model, and how they created value by extracting data from code. Brad wanted to know if it changed the way people coded? Steve’s answer was that it turned engineering more into finding the prior art that was already owned by the company, and by extension finding the person who originally checked it in, and might be an expert.

Jeremie had the interesting observation that the dark web inside enterprises will wither as search tools are more widely used, and anything non-searchable becomes unused.

I jumped in with my own question about when I’d be able to search based on a personalized ranking, in essence providing a personal vertical search engine for everyone. This is something I strongly feel is an obvious extension of where we’re going; I have an idea of the sites that I and my friends/colleagues visit already, and most of the results I want come from that comparatively small subset of the web, or those directly linked from them. I wanted to know if the panel agreed, and if they knew anyone who could give me what I wanted?

Marti thought that using other sort of information, such as your calendar, might be a good start. Brad said that de.licio.us was moving in that direction. Steve suggested I check out me.dium, which is close in that it does show me social information about the sites my friends go to, but doesn’t let me search.

Doc Searls wanted to know how we could integrate time into search? Jeremie thanked the lord that Brewster Kahle had the foresight to capture the early web with the Wayback Machine, to give us a data-set we can use for this sort of thing. Lou agreed that looking back in time was a key desire of searchers, and it was under served at the moment. Marti thought that 90% of people didn’t care, and this was a long tail thing for specialized niches. She suggested looking at Google’s news archives for an interesting example of analysis over time.

Esther Dyson wanted to know if the panel expected the structure of search results to change? Steve thought that the answer lay in visualizations that focused on a specialized audience. Marti suggested that classic good presentation would be the answer. Brad professed amazement at how ossified result presentation is, how it’s barely changed from the early days of the web. This was obviously music to my ears, even though my attempts to improve this are still incremental changes.

Defrag: The theory behind 2.0 tool adoption in enterprises


Andrew McAfee started off the day talking about the management theory behind web 2.0 technology adoption in large enterprises. His foundation was the bullseye diagram shown above, where a knowledge worker’s colleagues are divided into concentric circles based on their relationship. Strong ties bind the worker with the people she works with every day, there’s constant communication. She has weak ties with colleagues that she only speaks to occasionally, and then there’s a large pool of potential colleagues who she could benefit from communicating with, but doesn’t, the ‘potentials’. Outside of that is the rest of the company, who there’d be no business reason to talk to.

Using this system, Andrew outlined his thoughts on what 2.0 tools were useful for each tier. For the strong ties, the team you already work closely with, he suggested that wikis were the most useful technology. People are already engaged with each other, and a wiki offered the obvious benefits of productivity and agility in collaboration.

For those you have weak ties with, he offered the less obvious suggestion of social networks. He explained the initial negative reaction most decision-makers have to the idea of a social network in their organization, where they imagine it will just be used to organize friday-night happy hours. In fact, social networks can give people with weak ties to each other the ability to keep in touch with little effort, and discover important information about each other’s activities. This is crucially important because these weak ties are he ones who have access to radically different pools of information than your close team. The have access to non-redundant information, and can act as bridges to other networks. Using social networks, useful information emerges that would otherwise have been hidden.

The least attractive technology to executives is blogging, but this is the most useful one for reaching out to the large sphere of potential colleagues. He described the role of brokers in networks, people who act as bridges between otherwise isolated sub-groups within the organization. I always like to imagine these people as similar to village gossips, and have to admit I sometimes enjoy that role within my team. Once you get these uncommon but prolific people blogging internally, you start to see unexpected connections being made within the organization. The benefits are innovation, serendipity and network bridging, and what you start to see is teams emerging from shared interests.

As a non-tech example, he picked IntraWest, a company that builds resorts. They have an intranet that includes the ability to blog, and one of their employees posted his discovery of how to save $500,000 with a new technique for pouring heated concrete flooring. For a technology company, he pointed to Avenue A/Razorfish, a web design firm that heavily uses internal blogging and RSS feeds.

It seems like their ‘no connection’ people in an organization shouldn’t be useful to a knowledge worker, but Andrew brought up the interesting example of prediction markets. In real-world stock markets, they’re the way that strangers who never talk to each other arrive at accurate valuations. The benefits are that you tap into the collective intelligence of the company, and answers emerge. He discussed how traditional models fail to predict movie opening day takes, but the hollywood stock exchange gave startlingly accurate assessments.

Using this model of ties gives us a whole lot of benefits. You can conceptualize and articulate the value that the technologies bring. It helps decision-makers to choose the tools to match their goals. It can also be useful for drawing borders around which tools should be used in which places, for example whether a wiki is appropriate for a group with weak ties. The model also gives some clues and suggestions for how people can adopt and exploit the tools optimally.

He also warned the audience not to expect all the ties in an organization to be made equal, but to hope for these tools to help build some new ties.