Defrag: How taxonomies meet folksonomies; or the role of semantics on the web

Bookshelves_3

Karen Schneider gave a talk drawing on the centuries of experience that the library community has in classifying and organizing information, and the relationship between those formal taxonomies and tagging approaches like de.icio.us. She’s made her slides available here.

She started by laying out the relevance of libraries, with a look at community college students library usage. They’re checking out roughly the same number of books as a decade ago, but they now check out many ebooks too, as well as accessing databases, so overall usage has actually increased. Community college students are 49% of the total undergraduate population in the US, and they’re poorer and work longer hours outside of college than average. They’re a very demanding audience, so their heavy usage demonstrates that libraries are providing an efficient and useful service.

She then tackled a few librarian stereotypes. I’ve been a library-blog lurker for years, so I didn’t need any convincing, but she got some laughs with Donna Reed the spinster librarian from the alternate world in Its a Wonderful Life. I was disappointed that Giles was missed out, but you can’t have everything.

The next point was a quick demonstration of some typical library software, and how awful it was. The presentation was essentially the same as a card catalog, very static and uninvolving. Doc Searls had talked the day before about data in general being trapped in disconnected silos, in hard-to-use formats, and library systems suffer from the exact same problems.

WorldCat is a universal library catalog that lets you find books and other items in libraries near you. It’s still based on the old card-index model of marked (edit- MARC, that makes more sense, thanks Karen!) data, but it’s a big step forward because it’s linking together a lot of different libraries’ data sources.

There’s some issues that traditional taxonomies have been wrestling with for a long time, that are also problems for the newer technologies. Authority control is the process of sorting out terms which could be ambiguous, for example by adding a date or other suffix to a name to make it clear which person is referred to. Misspelling is another area that librarians have spent a lot of time developing methods to cope with. Stemming is problematic enough in English, but she discussed Eastern European languages that have even tougher word constructions. Synonyms are another obstacle to finding the results you need. She showed a del.icio.us example where the tags covering the same wireless networking technology included "wifi", "wi-fi", "802.11", "802.11b". Phrase searching is something that library data services have been handling for a lot longer than search engines. And finally, libraries have been around for long enough that anachronisms have become an issue, something that tagging systems have not had to cope with. Until the 90’s, the Library of Congress resisted changing any of its authoritative terms, such as afro-american or water closet, even though they’d become seriously out-dated.

Disambiguation or authority control is something that taxonomies are very good at. The creators of the system spot clashes, and figure out a resolution to them. Worldcat Identities is a good example of the power of this approach. Interestingly, Wikipedia is very good at this too, as the disambiguation page for ‘apple’ shows. She believes this is the result of a very strict, well-patrolled community where the naming is held to be extremely important, and believes the value of the naming is under-appreciated.

Another strong point of traditional cataloging approaches is the definitions. Wikipedia seems to have informally developed a convention where the first paragraph of any entry is actually a definition too.

Having somebody with in-depth expertise and authority on a subject do a centralized classification can be extremely efficient. She gave the example of a law library in California that as an excellent del.icio.us tagging scheme, but I couldn’t find the reference unfortunately. (edit- Here it is, from the Witkin State Law Library ).

Library catalogs have an excellent topic scheme, they have a good hierarchy for organizing their classifications, which is still something that folksonomies are  trying to catch up with, using ideas like facets.

These are all areas where folksonomies can learn from taxonomies, but there’s plenty of ideas that should flow the other way too. One of the strengths of tagging is that it’s really easy to understand how to create and search with tags. The same can’t be said for the Dewey decimal system. Cataloging in a library involves following a very intimidating series of restrictions. Tagging doesn’t frighten off your workforce like this.

In the short term, tagging is satisficing, and trumps the ‘perfection’ of a traditional taxonomy. In 2006, the Library of Congress was proud to report they’d cataloged 350,000 items with 400 catalogers. That works out to only about 3.5 records per day!

Tagging is also about more than just description. It’s a method for discovery and rediscovery, use and reuse, with your own and other peoples bookmarks. Folksonomies produce good meta-data; some people seem concerned that 90% of flickr photos fall within six facets, but this is actually a good reflection of the real world.

It seems like library conferences are a lot more advanced than defrag in handling tags, since there’s a formal declaration of a tag for each event in advance, and then that’s used by everybody involved. There wasn’t anything this well-publicized for Defrag, and the one chosen, ‘defragcon’, caused a sad shake of the head, since that’s not future-proof for next years conference, and we’ll end up having to change our tags to add a year suffix.

She also brought up the point that basic cataloging and classification techniques seem to be instinctive, and not restricted to a highly-trained elite of catalog librarians. We all tend to pick around four terms to classify items.

There is a common, but useless critique of folksonomies; that personal tags pollute them. This is a useless criticism because it’s easy for systems to filter them out. A more real problem is the proliferation of tags over time, which ends up cluttering up any results. There’s also the tricky balance between splitters and lumpers, where too finely divided categories give ‘onesie’ results where every item is unique, or overly broad classes where the signal of the results you want are overwhelmed by the noise of irrelevant items.

There’s some examples of ‘uber-folksonomies’, which take the raw power of distributed classification, and apply a layer of hierarchy on top. Wikipedia is the best-known example, and its greatest strength is how well-patrolled the system is. LibraryThing is a system that lets you enter and tag all the books in your personal library. The Danbury library actually uses the information people have entered in LT to recommend books for their patrons who search online, as well as using pre-vetted tags to indicate the categories each book belongs to. The Librarians Internet Index is another well-patrolled classification system for websites (though it looked a surprisingly sparse when I checked it out). The Assumption College for Sisters has been using del.icio.us to classify its library. Karen pointed out that it’s hard to imagine anyone more trustworthy than a nun librarian! Thunder Bay Public Library has also been busy on del.icio.us.

A deep lesson from the success of folksonomies is that great things can be achieved if people want to get involved. We need to incentivize that activity, and she used the phrase ‘handprints and mirrors’. She didn’t expand on the mirrors part, but I took that to mean the enjoyment people took from looking at a reflection of themselves in their work. We all want to feel like we’ve left some kind of handprint on society, so any folk-based system should reflect that desire too.

She only took one question, asking how libraries are doing? She replied that they’re an example of the only great non-commercial third-space. She also gave examples about how people like to be in that space when they’re dealing with information, even if they’re not there for the books.

Defrag: Visualizations of Social Media

Mattdiagram

JC Herz’s talk was originally "Visualization of Social Intelligence", but she felt that "Visualizations of Social Media" was a better description of what she was actually discussing. She started off in an iconoclastic vein, comparing current social visualizations to lava-lamps or snow globes; fascinating, but conveying absolutely no information. Pretty but useless.

To find a better way, she took us on a whistle-stop tour of the history of graphical data, or pictures that tell stories. Snow’s cholera map and Guerry’s suicide map of France were two classic examples. One common theme with all the classic visualizations is that they were snappy graphics for hidebound institutions. Because producing them involved a lot of manual effort and time, they were only created to address pressing problems.

She then moved on to an example of some work she’d done in this area. She had a brief to investigate how you can visualize coordination in a group, using data from the America’s Army game. She created a way of viewing a particular team’s communications over time, with a 3D graph showing communications as circles on a line, with a line for each team member and the distance along the line representing time. Using this tool, she was able to draw out statistical measures that could tell which teams were effective, based purely on their communication patterns.

One thing she thought important to emphasize is the value of time as a parameter in most visualizations. Current social graphs have no awareness of time.

Another important component of a successful visualization is that it should have consequences. She took the example of the study of dating patterns in a mid-west high-school, and asked if anyone would have been interested in the same graph showing SMS patterns in that high-school? Another example that speaks to the same question is a map showing what name people in different parts of the country use for pop or soda. There isn’t a big consequence to this map, but the minor light it sheds on the culture of the US is enough to get people interested.

The Trulia Hindsight visualization combines both something people are very interested in, real estate, and shows it over time. It tells a story, in a very compelling way.

Space is an important dimension for telling stories too, which is why maps showing political fundraising by area are so fascinating. Conflict or drama is the third ingredient to a compelling story, which is why the diagram showing book buying patterns by party affiliations was such a success.

A story isn’t just an automatic result of running data through an algorithm, to get insight, you have to engage in a dialog with the data. If you ask stupid questions, you get stupid answers.

To wrap up, she proposed some principles about what makes a visualization useful. There should be less information, but the right information. Not just a mish-mash of all the data you have, but a focused version that shows a selected subset of interesting or surprising information. All visualizations should tell a story, which requires notions of time, space, and something at stake. This is why so many popular visualizations are political, because people are trying to make important decisions. To be useful, the visualization also has to be sharable. You’re trying to tell a story to affect something in the real world, and the only way of affecting things is by getting other people involved.

A good test for whether a visualization is any good is asking if it has any consequences? It’s such a waste to go to all the effort of producing a diagram if it doesn’t matter. Any artifact you produce must be sharable to be effective.

The first question came from Matt Hurst, the author of the first diagram JC used in her rogues gallery of pretty but useless visualizations! He’s got an online response here too. He wondered if the America’s Army diagrams were any more intelligible? He also brought up the survivorship bias problem; you can’t know if you’ll get something compelling out of your data set before you start attacking it. You never know which question you ask of it will produce something compelling or surprising.

JC agreed with this, and thought the answer was to emphasize the analysis stage, rather than skipping it.

A lady, whose name I didn’t catch, (edit- Marti Hearst, the Berkeley professor, who was in the "Next level discovery" panel, thanks Matt and apologies Marti, you were out of line of sight for me) said that it’s very hard to make visualizations. The reason that the Amazon political preferences example is still used, years after it was created, is that it’s tough to create something that compelling, and it also needs some luck. She agreed with Matt that the army example was confusing.

JC’s response was that you do need a lot of domain knowledge to be effective.

Another point the same lady Marti brought up was that in the high school example, SMS patterns could be incredibly important if it was a forensic investigation after a school shooting.

As support for this idea, JC brought up the example of the correlation between Snickers sales in prison commissaries, and riots. It’s apparently the best method of predicting riots, much better than more obvious metrics like violent incidents, because the inmates know something is brewing, and want to stock up on food.

Mershad Setayesh from Collective Intellect said that visualizations only make sense if you can see a pattern. How do you do that? Is there a methodology to get patterns from your data?

JC suggested various methods, applying calculus, and bundling points up using natural analysis, to find things like cliques.

Defrag: Next generation disruptive technology open space

Dynamite

Despite the extra talk on OpenSocial there was still some time for open space discussions. The three topics were "OpenSocial vs ClosedPrivate" led by Kevin Marks, "What is Defrag?" with Jerry Michalski and "What are the next generation disruptive technologies?", proposed by Jeff Clavier. I chose to go to the last one, since I was interested to hear what people were expecting to see. People were throwing out interesting ideas and challenges incredibly fast, and I’ve tried to capture them here.

Matthew Hurst of Microsoft kicked off the discussion with his concern that lack of trust that a service will be there next week was hindering the adoption of new technologies. He doesn’t want us to be focusing on changing people’s behavior, he thinks the future will be in simpler clients.

David Cohen would just like his identity to be portable across all the systems. Matt was concerned that this would make information leakage a much bigger problem.

Jeff’s big question was about what the friend relationship means once you start doing that? The relationship is all about context, and your myspace friends won’t be the same as your linkedin contacts. There’s also no concept of strong ties versus weak ties.

Seth Levine wanted to know if anyone liked Facebook’s self-reporting mechanism, and gave the example of an acquaintance who marked his friend request with "We hooked up", not realizing the real meaning of the term!

I suggested email analysis as an approach to a better understanding of relationships. Kevin from the Land Grant Universities research access initiative was sceptical that his gmail inbox reflected his strong ties. I countered that I thought that was true for our general life, but that most people’s work inbox’s did correspond fairly well with those ties, since email is still the primary communication tool in many companies.

Ben from Trampoline Systems injected a note of skepticism into the discussion, noting that no weighting information could be added without support from the platforms that own the social network, like Facebook and Linkedin, so it’s futile to discuss it before that’s a reality. A lady (whose name I didn’t catch) pointed out one of the major things missing from networks is the recognition that relationship’s change over time.

Clarence Wooten, CEO of CollectiveX, was concerned that friend overload was becoming as bad as RSS. With so much undifferentiated data coming in, with the same mechanism for both childhood friends and chance acquaintances, it was becoming less useful. Some of the other folks proposed technical solutions  for the RSS overload problem, including a couple sold by their own companies.

David Kottenkop(sp?) of Oracle laid down the challenge that computers should be able to figure out the ranking and classification of relationships based on communication patterns. As an editorial comment, this is something I’m convinced is the future, and is the idea behind a lot of what I’ve been working on.

Craig Huizenga of HiDat thought there were basically two options for organizing all this data; direct formal attributes or tagging, either something that’s tricky but has semantic meaning, or something more informal but easier to use. Across the table, someone suggested that what we really needed was another level of hierarchy for tags, so you could tag tags themselves.

Matt Hurst was concerned that a lot of the systems we were working on were susceptible to exploitation by users once they figured them out. For example, tagging is now being used to spoof search engines. The same thing is true for identities, people have very different behaviors on MySpace and LinkedIn, which friends they’ll accept, and what information they’ll reveal. Just as Google has trained us all to use keywords in search, we’ve been trained into certain behaviors by these services. An interesting practical example of a world where universal identities are starting to appear is message boards. Since there’s only a few different forum systems, it’s technically practical to hand-code interfaces to all of them, and allow users to avoid the hassle of repeated manual registrations with different sites.

Doruk pointed out that the social networks had to solve the problem of weighting friends, or die because they become useless. One term thrown around for this was relationship management.

Ben pointed out that we’re a really skewed sample to be talking about this stuff. MySpace is still massive, we’re not the mainstream users. There’s a generation gap here, where we’ve got very different perceptions than the kids of today. Matt mentioned as an example that Myspace users appreciate seeing ads on a site, because they know that means that it’s a free service, and they won’t be ambushed with any charges.

He wanted to know where the monetization of future services would come from? He doesn’t want more ad-led services, he wants to know where the painful problems are, that people will pay us money to solve?

This led me to ask whether the enterprise was the answere? Was that where we could still sell services? Matt’s answered with a maybe, since there’s no possibility of ads there, and there’s mechanisms to force people to use services if someone on the hierarchy decides it’s necessary.

Seth suggested that we all needed to bring our high-falutin’ visions down to something real and concrete. He described his first experience of working in a large corporation, and being astonished to discover that everyone left at 5pm. 90% of people want to go home, not wrestle with new technology in the hope it will eventually make them more productive. He thought NewsGator was a great example of a company taking our fancy technology, and turning it into solutions for everyday problems.

Ben agreed that no one wants to adopt this stuff in a corporation. Matt suggested that the only way forward was to throw some of this technology into firms, and see how people creatively decide to use it.

Beth Jefferson of BiblioCommons jumped in, suggesting that these technoogies brought up a lot of tricky issues. People who aren’t friends-of-friends with a decent number of people end up isolated if there’s heavy use of social networks, you’re just magnifying the effects of cliques. She gave a great quote; "Search is a representative democracy with unfair elections". The same goes for blog postings, we pretend to egalitarian principles, but know that there’s a core oligarchy of highly-influential bloggers.

Jeff brought the discussion back to first principles by asking what the world really needs? Craig suggested a way to deal with the overload of information. Jeff thought that turning off your computer for a week, and seeing what you missed, was a good start.

Matt’s concern was how flat data was, and the lack of tools to deal with it in a meaningful way. Seth suggested sorting out some data formats, so we can visualize all this information. Christian of the CAF advisory council suggested that finding experts from amongst our circle of friends was a great unsolved problem.

Greg Cohn of Yahoo gave probably the best comment of the session when he suggested that the biggest problem that faces the world is the lack of clean water for billions of people, not information overload. This led to a really interesting discussion in the session and afterwards about how we change policy to solve these real, people-are-dying, problems. I’m going to need a post on its own to justify, but it’s something that’s I have trouble forgetting; as a community we’re incredibly lucky, and are we really giving back enough to the rest of the world that’s trapped in poverty?

Defrag: Next level discovery panel

Vla

Brad Horowitz of Yahoo led a panel that was looking at the topic of "Next level discovery". Marti Hearst, from UC Berkeley had ‘cheated’ and prepared some slides, but they all set the stage for the following discussion.

She started off with an interesting distinction between the current search model of ‘navigating’, where you incrementally step towards the answer you’re looking for. The real world analogy she uses was orienteering, where you’re reading a map and compass to move between landmarks. What she’s hoping for in the future is ‘say what you want’, where you teleport instantly to the result you want.

Her hope was that a couple of trends were coming together to make this vision a reality. There’s now massive collections of user information that are being analyzed to produce better algorithms than PageRank to match results to particular searches. She singled out Ryen White’s paper on using popular destinations to predict the search results people wanted as an example of the way things would move in this direction. She also described Ebay Express’s UI as a practical service using natural language processing, with the example of searching for reebok womens shoes, and getting back a page with useful categories listed based on an understanding of the terms.

Lou Paglia of Factiva was the next speaker up. He describe Factiva’s work as taking news sources from all over the world, and normalizing them into a searchable form. He felt that search and discovery were the same thing, just different aspects of the same spectrum. His big wish was for a real semantic web.

Jeremie Miller, creator of Jabber and currently with Wikia, laid out his view of the way the web was organized. At the moment, as Doc Searls and others discussed yesterday, information is held in walled-off corporate silos, there’s few truly open sources of data. To look into the future, it’s useful to look at the history of the internet. The first level was heterogeneous networks connected by common protocols like TCP/IP and email. Second was the web, also connected by open protocols. He’s not sure what the third level will look like, but my inference was that he was betting on open protocols being an important part. As an editorial remark, I’m not so sure that the second level was as open as the first. You can definitely use open protocols as a client to access servers, but there’s few server-to-server open protocols in the web world, unlike the pre-web universe of usenet and email. That’s one reason I’m hoping client-side approaches will help return some of what we lost in interoperability.

Next was Steve Larsen of Krugle. His argument was that there was a big difference between PageRank search and indexing. Indexing is like Yahoo’s original directory, where there’s some knowledge of what terms mean, and is tied to the idea of the semantic web. He used the analogy of network news compared to cable in the early 90’s, with vertical search engines as the small, focused competitors that were able to eat their established rivals’ lunches. The example he threw out was that Krugle always knows that python is a language, not a reptile.

Brad finished the introductions by talking about his desire to ban the word ‘user’, since it was so cold and technical. We need to treat our customers as people, give them the tools to participate in cataloging the world. His example was Flickr’s interestingness algorithm, that uses existing data about people’s interactions with the site to figure out a good approximation of photo’s popularity. Like Marti, he’s betting that looking at people’s behavior is going to be the next big advance in search technology.

The first question was "Does search suck?". Steve thought it was worse than it had been. Marti thought that the reliance on keywords made internet searching suck. Lou had a more fine-grained answer, that searching for detailed, specialized information was really hard, but that it worked quite well for general questions. The other problem in the enterprise is the hidden, unsearchable web.

Brad’s answer drew a lot on his experiences with Yahoo Answers. In Korea, there was a real shortage of native-language results for a lot of queries, and so the service became very popular, as either knowledgeable people, or those with the language skills to translate foreign results could supply the content as needed. He described what emerged as a kind of blogging-on-demand.

Brad then threw a hardball to Lou, and asked him whether there was a future in paying for search services. Lou admitted this was a hot topic around the water cooler at Factiva, and that going free was always one of the options for the future, though their business model was working well at the moment.

Next, the discussion turned to Krugle’s model, and how they created value by extracting data from code. Brad wanted to know if it changed the way people coded? Steve’s answer was that it turned engineering more into finding the prior art that was already owned by the company, and by extension finding the person who originally checked it in, and might be an expert.

Jeremie had the interesting observation that the dark web inside enterprises will wither as search tools are more widely used, and anything non-searchable becomes unused.

I jumped in with my own question about when I’d be able to search based on a personalized ranking, in essence providing a personal vertical search engine for everyone. This is something I strongly feel is an obvious extension of where we’re going; I have an idea of the sites that I and my friends/colleagues visit already, and most of the results I want come from that comparatively small subset of the web, or those directly linked from them. I wanted to know if the panel agreed, and if they knew anyone who could give me what I wanted?

Marti thought that using other sort of information, such as your calendar, might be a good start. Brad said that de.licio.us was moving in that direction. Steve suggested I check out me.dium, which is close in that it does show me social information about the sites my friends go to, but doesn’t let me search.

Doc Searls wanted to know how we could integrate time into search? Jeremie thanked the lord that Brewster Kahle had the foresight to capture the early web with the Wayback Machine, to give us a data-set we can use for this sort of thing. Lou agreed that looking back in time was a key desire of searchers, and it was under served at the moment. Marti thought that 90% of people didn’t care, and this was a long tail thing for specialized niches. She suggested looking at Google’s news archives for an interesting example of analysis over time.

Esther Dyson wanted to know if the panel expected the structure of search results to change? Steve thought that the answer lay in visualizations that focused on a specialized audience. Marti suggested that classic good presentation would be the answer. Brad professed amazement at how ossified result presentation is, how it’s barely changed from the early days of the web. This was obviously music to my ears, even though my attempts to improve this are still incremental changes.

Defrag: The theory behind 2.0 tool adoption in enterprises

Andrew2

Andrew McAfee started off the day talking about the management theory behind web 2.0 technology adoption in large enterprises. His foundation was the bullseye diagram shown above, where a knowledge worker’s colleagues are divided into concentric circles based on their relationship. Strong ties bind the worker with the people she works with every day, there’s constant communication. She has weak ties with colleagues that she only speaks to occasionally, and then there’s a large pool of potential colleagues who she could benefit from communicating with, but doesn’t, the ‘potentials’. Outside of that is the rest of the company, who there’d be no business reason to talk to.

Using this system, Andrew outlined his thoughts on what 2.0 tools were useful for each tier. For the strong ties, the team you already work closely with, he suggested that wikis were the most useful technology. People are already engaged with each other, and a wiki offered the obvious benefits of productivity and agility in collaboration.

For those you have weak ties with, he offered the less obvious suggestion of social networks. He explained the initial negative reaction most decision-makers have to the idea of a social network in their organization, where they imagine it will just be used to organize friday-night happy hours. In fact, social networks can give people with weak ties to each other the ability to keep in touch with little effort, and discover important information about each other’s activities. This is crucially important because these weak ties are he ones who have access to radically different pools of information than your close team. The have access to non-redundant information, and can act as bridges to other networks. Using social networks, useful information emerges that would otherwise have been hidden.

The least attractive technology to executives is blogging, but this is the most useful one for reaching out to the large sphere of potential colleagues. He described the role of brokers in networks, people who act as bridges between otherwise isolated sub-groups within the organization. I always like to imagine these people as similar to village gossips, and have to admit I sometimes enjoy that role within my team. Once you get these uncommon but prolific people blogging internally, you start to see unexpected connections being made within the organization. The benefits are innovation, serendipity and network bridging, and what you start to see is teams emerging from shared interests.

As a non-tech example, he picked IntraWest, a company that builds resorts. They have an intranet that includes the ability to blog, and one of their employees posted his discovery of how to save $500,000 with a new technique for pouring heated concrete flooring. For a technology company, he pointed to Avenue A/Razorfish, a web design firm that heavily uses internal blogging and RSS feeds.

It seems like their ‘no connection’ people in an organization shouldn’t be useful to a knowledge worker, but Andrew brought up the interesting example of prediction markets. In real-world stock markets, they’re the way that strangers who never talk to each other arrive at accurate valuations. The benefits are that you tap into the collective intelligence of the company, and answers emerge. He discussed how traditional models fail to predict movie opening day takes, but the hollywood stock exchange gave startlingly accurate assessments.

Using this model of ties gives us a whole lot of benefits. You can conceptualize and articulate the value that the technologies bring. It helps decision-makers to choose the tools to match their goals. It can also be useful for drawing borders around which tools should be used in which places, for example whether a wiki is appropriate for a group with weak ties. The model also gives some clues and suggestions for how people can adopt and exploit the tools optimally.

He also warned the audience not to expect all the ties in an organization to be made equal, but to hope for these tools to help build some new ties.

Defrag’s first day

Ink

I’m just wrapping up the first day of Defrag, and it’s been exhilarating. It really does feel like several days at most conferences packed into one. You can see some great in-depth coverage of everything on AltSearchEngines, Charles was live-blogging in front of me in the morning.

First up was David Weinberger, billed as "Everything is miscellaneous", but actually renamed on the fly to "What’s unspoken between us". He’s posted his own notes from the night before, but what really struck me was how useful it can be to step back from the technological bubble I live in, and use the wider world to get a fresh perspective on what we’re doing. He drew on the poetry of Rilke and Rabbinic teachings to explain both how much meaning is implicit in our world, and the hope that computers will gain a soul, through their close association with us.

Clay Shirky couldn’t make it to give the next keynote, so instead there was a round-table discussion loosely based around "Social intelligence". Jerry Michalski moderated, with Joshua Schacter of de.licio.us fame, JB Holston from Newsgator and JP Rangaswami, the CIO of BT. It was unstructured, but full of interesting nuggets, such as Joshua never using RSS, or the idea of deliberately avoiding automatic spam blocking so that the community steps in. What I found most relevant to my work is the descriptions of the existing ‘attachment culture’ in most companies, where collaboration is done by emailing around Word, Excel and Powerpoint files. Mailing lists are a related ubiquitous big collaboration tool.

The general tone, being tech visionaries, was pretty derisive about these approaches, but I’m a contrarian on this one. I think if you can’t understand why people are resisting moving to the latest techniques, you’re probably overlooking some important advantages to the traditional tools. I developed some of these thoughts in a later open space discussion about user acceptance. Compared to a wiki, emails have a much simpler and more explicit security protocol. You make a decision about exactly who sees what you send out, with a clear chain of accountability if one of those recipients decides to make it more widely available. On a wiki, the visibility is determined by somebody else in an opaque way, and it’s a lot harder to understand who’s to blame if something does get wider exposure than it should.

Michael Barrett from PayPal then gave a mini-talk, "A message of warning". He’s worried that we’re merrily building tools with little thought to security, and that we’ll end up like telnet, impossible to use in any situation that requires secrecy. It’s hard to argue with that idea, but it’s also hard to see what the solution is. I’ve yet to see a startup wow its investors with a security demo, and there’s never a easy time to devote resources to securing your software. It took years of bad publicity before even MS moved away from fudging security in favor of user-facing features.

I’d never been involved with an UnConference-style open space, and it was hard to choose amongst all the topics. I picked the user acceptance theme, and to my surprise found myself the only one without a collar. It seems like the number one cause of failure of new tools is lack of user acceptance, so I was expecting a lot more techies and small companies. Instead it seemed dominated by people looking from the business side of large companies and trying to work out how to sell their employees on these ideas.

Andrew McAfee
of HBS was one of the leading voices in the discussion, and one of his most interesting ideas was that the security argument used to block use of wikis internally is hogwash. I’m no stranger to bogus security concerns being used to veto change, but as I said above, I think there’s at least a kernel of truth in this case. One of the unspoken ideas behind Defrag is that ‘information wants to be free‘, and a corporation will be a better and more productive place if only we can enable wider sharing.

This is probably true on the macro scale, but on the front lines there’s both winners and losers. Someone whose power is based around being the holder of arcane knowledge will fight tooth and nail against this. As a more benign example, a line-manager may not want other departments taking up his star engineer’s time to answer questions, if it’s to the detriment of the project he’s responsible for. Fundamentally, people’s performance is usually judged by the progress they make on their own tasks, not the overall benefit they provide to the company as a whole, since that’s a lot harder to measure. It would be nice to see this change, but until it does, there’s going to be resistance to collaboration, and the security argument is a useful tool in that resistance.

At lunch-time, I ended up sharing a table with Doc Searls, JP Rangaswami and Andrew, which was pretty heady company.

I had another tough choice to make after eating, since the conference split into two tracks. I wanted to see the panel with Adam Gross and John Crupi, especially after yesterday’s discussion with John about how to do something better with email, but Charles Armstrong from Trampoline Systems was speaking in the other room. They’re a really interesting company living in the same space I’m interested in; analyzing email to automatically figure out relationships within an organization, and then doing something useful with that. Dawn Foster and Aaron Fulkerson from MindTouch joined Charles on a panel themed "Social networking the enterprise".

Charles started with a description of his background as an ethnographer, and the inspiration for his work coming from his study of the communication techniques used by small communities. He’s using the insights he gained to write tools that analyze email and IM data, and use it to find experts within a company, or visualize the way people actually communicate within the organization. These are both areas that I’m really interested in, and it was great validation of the opportunities in this area to see how many customers Trampoline had picked up.

Alex Iskold from AdaptiveBlue/ReadWriteWeb then gave a talk on "A look at structured attention". He focused on the benefit to users of being able to control and share their own activity streams. As a practical example, if NetFlix could access your Amazon buying information, it could provide a lot better recommendations. He was proposing a model where there was some central, company-agnostic
data-store that all the services contributed to and pulled from. I’ve long been convinced that this would be a big leap forward, and allow startups without their own user-activity logs to do really interesting things, but I have a hard time understanding how to persuade Amazon to give up their competitive advantage. 

Alex asked whether the big players would open up, and in the discussion at the end, I pressed him about what his answer was, and what we should do if they do keep saying no. He seemed cautiously optimistic that it’s possible to produce some client-side approaches instead, which is something I’m betting on too.

Dick Hardt gave hands-down the most entertaining talk of the day, packing in 450 slides in 12 minutes, on the topic of "Defragging identity". I really need to try something similar for one of my corporate presentations, just to keep everyone awake! He’d obviously practiced like crazy to get the talk spot-on, it was a virtuoso performance. The content was good too, a primer on the history and evolution of trust, and how it was all based on past behavior predicting what someone would do in the future. He took us from the village where you knew everyone’s past first-hand, to cities where you had to trust strangers. After urbanization, people turned to third-party institutions to provide certificates indicating past behavior, for example a doctor’s qualifications. In the same way, online we want identities that we can attach tokens demonstrating past behavior to. These may not be a single, monolithic identity for everyone, we may use different identities in different situations, such as online games.

After this quick talk, Esther Dyson stepped up to the plate with "Discussing attention". It was really useful to hear her perspective on targeting advertising using individual consumer’s behavior, as someone who’d been involved on the marketing side. She had just returned from some FTC hearings on the same topic, and proposed a solution that had definite resonance with Alex’s ideas. The proposal was that users get access to the composite profile information that services generate from on the raw click-streams and buying habits, as hey can do with credit reports. This would allow consumers to escape from being labeled in incorrect or insulting ways, the "My Tivo thinks I’m gay" problem. Esther didn’t have a fully-formed proposal, but it was an interesting approach, and she was looking for feedback and improvements from the audience at Defrag. It raised some questions about who actually owns that data, the user or the company that captured it. With my client-based bias, I’m still pretty convinced that we’ll never persuade those firms to open up, and we’ll need to run on the user’s own machine to give them more control of that information.

In "Customer reach versus vendor grasp", Doc Searls was on very similar territory. He’s rebelling against the constant obsessive measuring and pigeon-holing  that’s behind personalized marketing. He asked "Who here wants to be better targeted?", and only one brave soul stepped up and said they did. Doc used a Walt Whitman poem to drive home the uniqueness and irreducibility of every human out there. That led to the idea that we should be able to control how companies see us, with him using the term "Vendor Relationship Management" to describe his approach, in opposition to the traditional customer relationship management that’s run by the vendors. He’s taken up the challenge of actually creating something like this with Project VRM, aimed at producing some practical software and standards to implement this vision. One of the compelling ideas he threw out is reversing the usual passive data model, where vendors pull information about user’s desires, and instead allow people to broadcast something they need, and see who can come up with a product that matches those requirements.

As somebody said in the discussion at the end, I’d love to lock Doc, Esther and Alex up in a room for a few hours, and see where their visions of the future match, and where they clash.

Ross Mayfield of SocialText gave the final talk of the day, and I was intrigued by the title, "Things to do in Denver when your corporation’s dead". Unfortunately he switched to "Made of people" instead, so I’ll never know how he would have lived up to the original heading. Ross’s talk covered a lot of ground, talking about the Radiohead album sale model, and how that approach could be used with other businesses, SocialText’s search for a new CEO and pulling in your customer’s expertise. The common thread with all of these is the active engagement of people all over the world in achieving your goal.

This just covers the formal talks, but of course some of the most interesting conversations happened in the hallway. I had a great chat with JC Herz about the work I’m doing on graph visualization, gave a demo to Robert Reich from me.dium, and received a demo of HiveLive from Greg Schneider.

I’m looking forward to another great day tomorrow, especially JC’s and Matthew Hurst’s talks on social visualization.

Funhouse Photo User Count: 2,099 total, 75 active.

Event Connector User Count: 105 total, 5 active.

CollectiveX and JackBe

Collectivexlogo
After dinner, I popped down to the hotel bar to peruse their selection of martinis. I met Clarence and Joe, the CEO and CEO CTO of CollectiveX.com. I hadn’t seen it before, but it’s an interesting service for anyone who’s collaborating. It provides a central location for group organization, you can email group members, have a shared calendar, host forums, all in one place. I chatted a bit about the ideas behind Event Connector, and how they tied in.
Jackbelogo

I also met John and Marilyn Jessica from JackBe. I was very interested to hear from John about his company’s experiences selling to large organizations. Marilyn Jessica also gave me a (probably well-deserved) ear-full about bringing a book along to read at the bar. I think the key adjective was ‘pretentious’, and my only defense was that I’m English, and I read the back of cereal boxes if I don’t have anything better to do. I also received some definitely well-deserved flak about the amount of rubbish I keep in my wallet, which I know Liz will second!

Updated with corrections to the unusual dual-CEO structure I implied, (maybe a RAID approach to corporate management?) and my deepest apologies to Jessica, who’s name I completely misremembered! All-in-all, an argument against posting after returning from cocktail hour. As penance, here’s the photo of my wallet that Jessica requested I add, to prove her point:

Wallet

Defrag has arrived!

Defragbanner

I flew into Denver this morning, and even though Defrag doesn’t officially start until tomorrow, I’ve already had a couple of early meet-ups with some of the local folks. It was fun seeing Rob and Josh from EventVue in the flesh for the first time, and hearing about all their hard work. They’ve been running at full steam since May I hope they get a chance for a break soon.

I also made an interesting discovery; Denver has two Hyatt hotels just a couple of blocks from each other, the Grand Hyatt and the Hyatt Regency. I only found this out after I’d dropped my car at the Grand’s valet parking and tried to check in! Luckily I was able to make it to the right one without further misadventure.

Funhouse Photo User Count: 2,097 total, 111 active. Ticking up gradually, with some good weekend active numbers.

Event Connector
User Count
: 109 total, 4 active.

Beautiful data

Datavisualization

With the mass of raw data I’m getting from a couple of years of my own email, I’m looking around for a good way to turn that into information. A simple ranking of my closest contacts is a good start, but I want to also see how much of the real-life groupings between others can be revealed. I’m working on a basic force-directed graph implementation, but that still leaves a lot of display choices.

VisualComplexity.com is one of my favorite places to find inspiration. They’ve done a great job collecting some of the most striking methods of presenting graph data visually. I also enjoy the Data Mining blog. Matthew’s a great resource and he’s good at reminding me to focus on getting something useful from my visualizations, not just pretty pictures. He’s headed to Defrag, so I hope I’ll get a chance to say hello.

Funhouse Photo User Count: 2,042 total, 52 active. Steady growth, but a low active count.

Event Connector User Count: 106 total, 13 active. A miniature growth spurt over the last day or two, with a comparatively large number of engaged users.