Defrag: Next level discovery panel


Brad Horowitz of Yahoo led a panel that was looking at the topic of "Next level discovery". Marti Hearst, from UC Berkeley had ‘cheated’ and prepared some slides, but they all set the stage for the following discussion.

She started off with an interesting distinction between the current search model of ‘navigating’, where you incrementally step towards the answer you’re looking for. The real world analogy she uses was orienteering, where you’re reading a map and compass to move between landmarks. What she’s hoping for in the future is ‘say what you want’, where you teleport instantly to the result you want.

Her hope was that a couple of trends were coming together to make this vision a reality. There’s now massive collections of user information that are being analyzed to produce better algorithms than PageRank to match results to particular searches. She singled out Ryen White’s paper on using popular destinations to predict the search results people wanted as an example of the way things would move in this direction. She also described Ebay Express’s UI as a practical service using natural language processing, with the example of searching for reebok womens shoes, and getting back a page with useful categories listed based on an understanding of the terms.

Lou Paglia of Factiva was the next speaker up. He describe Factiva’s work as taking news sources from all over the world, and normalizing them into a searchable form. He felt that search and discovery were the same thing, just different aspects of the same spectrum. His big wish was for a real semantic web.

Jeremie Miller, creator of Jabber and currently with Wikia, laid out his view of the way the web was organized. At the moment, as Doc Searls and others discussed yesterday, information is held in walled-off corporate silos, there’s few truly open sources of data. To look into the future, it’s useful to look at the history of the internet. The first level was heterogeneous networks connected by common protocols like TCP/IP and email. Second was the web, also connected by open protocols. He’s not sure what the third level will look like, but my inference was that he was betting on open protocols being an important part. As an editorial remark, I’m not so sure that the second level was as open as the first. You can definitely use open protocols as a client to access servers, but there’s few server-to-server open protocols in the web world, unlike the pre-web universe of usenet and email. That’s one reason I’m hoping client-side approaches will help return some of what we lost in interoperability.

Next was Steve Larsen of Krugle. His argument was that there was a big difference between PageRank search and indexing. Indexing is like Yahoo’s original directory, where there’s some knowledge of what terms mean, and is tied to the idea of the semantic web. He used the analogy of network news compared to cable in the early 90’s, with vertical search engines as the small, focused competitors that were able to eat their established rivals’ lunches. The example he threw out was that Krugle always knows that python is a language, not a reptile.

Brad finished the introductions by talking about his desire to ban the word ‘user’, since it was so cold and technical. We need to treat our customers as people, give them the tools to participate in cataloging the world. His example was Flickr’s interestingness algorithm, that uses existing data about people’s interactions with the site to figure out a good approximation of photo’s popularity. Like Marti, he’s betting that looking at people’s behavior is going to be the next big advance in search technology.

The first question was "Does search suck?". Steve thought it was worse than it had been. Marti thought that the reliance on keywords made internet searching suck. Lou had a more fine-grained answer, that searching for detailed, specialized information was really hard, but that it worked quite well for general questions. The other problem in the enterprise is the hidden, unsearchable web.

Brad’s answer drew a lot on his experiences with Yahoo Answers. In Korea, there was a real shortage of native-language results for a lot of queries, and so the service became very popular, as either knowledgeable people, or those with the language skills to translate foreign results could supply the content as needed. He described what emerged as a kind of blogging-on-demand.

Brad then threw a hardball to Lou, and asked him whether there was a future in paying for search services. Lou admitted this was a hot topic around the water cooler at Factiva, and that going free was always one of the options for the future, though their business model was working well at the moment.

Next, the discussion turned to Krugle’s model, and how they created value by extracting data from code. Brad wanted to know if it changed the way people coded? Steve’s answer was that it turned engineering more into finding the prior art that was already owned by the company, and by extension finding the person who originally checked it in, and might be an expert.

Jeremie had the interesting observation that the dark web inside enterprises will wither as search tools are more widely used, and anything non-searchable becomes unused.

I jumped in with my own question about when I’d be able to search based on a personalized ranking, in essence providing a personal vertical search engine for everyone. This is something I strongly feel is an obvious extension of where we’re going; I have an idea of the sites that I and my friends/colleagues visit already, and most of the results I want come from that comparatively small subset of the web, or those directly linked from them. I wanted to know if the panel agreed, and if they knew anyone who could give me what I wanted?

Marti thought that using other sort of information, such as your calendar, might be a good start. Brad said that was moving in that direction. Steve suggested I check out me.dium, which is close in that it does show me social information about the sites my friends go to, but doesn’t let me search.

Doc Searls wanted to know how we could integrate time into search? Jeremie thanked the lord that Brewster Kahle had the foresight to capture the early web with the Wayback Machine, to give us a data-set we can use for this sort of thing. Lou agreed that looking back in time was a key desire of searchers, and it was under served at the moment. Marti thought that 90% of people didn’t care, and this was a long tail thing for specialized niches. She suggested looking at Google’s news archives for an interesting example of analysis over time.

Esther Dyson wanted to know if the panel expected the structure of search results to change? Steve thought that the answer lay in visualizations that focused on a specialized audience. Marti suggested that classic good presentation would be the answer. Brad professed amazement at how ossified result presentation is, how it’s barely changed from the early days of the web. This was obviously music to my ears, even though my attempts to improve this are still incremental changes.

One response

  1. Pete: Great to see your coverage of the panel discussion. It was one where I think we were only able to scratch the surface. There are so many issues around discovery it is tough to make a lot of headway in 45 minutes.
    Building on some of your points above, content will always be a critical factor in search and discovery conversation. Whether free, fee, implicit, explicit or derived via semantic web (such as the Dapper guys discussed later).
    Commercially, I think it is about “value creation” and solving the problems for the user and that there needs to be renewed focus on the user, what are they trying to accomplish. The content is a huge piece of the formula. The other piece is the technologies used and how they are deployed to provide valuable search and discovery solutions in the way the user needs it. Current awareness search and the ability to answer hard questions are two search paradigms. I do not believe that they will be solved through the same methods but do believe that search and discovery can be used to solve both.
    Furthermore, where there is value created in a service, people are willing to use it and pay. In certain cases, ad solutions are the way to go where advertisers are paying and in others, payment will remain in the subscription service. I see strong parallels with other pay services such as ERP, CRM and enterprise search, all will face this challenge so the value has to be there.
    I’m glad to re-capped the time conversation. I loved the fact that time came. It is something I don’t think is though of as much as it should when we think value on the web. But time is becoming critically important as the flow of the web information becomes faster and faster. I work for Dow Jones/Factivabut in the Factiva business, we’ve understood the importance of time since the beginning. That is why we have a 30 year archive of relevant business news and information. Recently we’ve couple that with emerging discovery technologies where users can drill into relevant results via time. This was something I could not let slip by and had to mention because the ability is out there…
    Anyway, just some thoughts. Defrag was a fantastic conference, really great set of people working through some of the tough issues that we face today and some issues that will be created as we continue to move the technologies forward.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: