Enjoy a taste of Britain with Tikka Masala

Tikkamasala

There’s not much food I miss from the UK, but life isn’t worth living without an occasional Chicken Tikka Masala. Invented in the 70’s, it’s a combination of traditional Indian spices and the British love of sauces. Tender chunks of chicken are slowly cooked in a creamy tomato and yogurt sauce and served on rice with a side of naan bread. I’ll show you how to make your own at home.

Ingredients

Serves 2-3

1/4 stick of butter
1/2 teaspoon cumin seeds
cinnamon stick
8 cardamom pods
12 cloves
2 bay leaves
1 medium onion
6 cloves of garlic
1 piece of ginger root
1 tablespoon of powdered cumin
1 tablespoon of powdered coriander
1 14 oz can of diced tomatoes
1/2 teaspoon cayenne pepper
3/4 teaspoon salt
1 pair of chicken breasts (about 1lb)
2 tablespoons plain full-fat yogurt or sour cream

Preparation

The cooking itself takes about an hour, but you’ll need to prepare the ingredients before that starts. That usually takes me around 45 minutes, and I recommend a small glass of Kingfisher beer to help you along and add an authentic curry-house feel.

Garam Masala

The spice base used for many Indian dishes is known as garam masala. I’ve tried a lot of prepared mixes, but I’ve never found one that hits the spot. To make your own, take the cumin seeds, cinnamon, cardamoms, cloves and bay leaves, and place them on a small plate. Later you’ll be throwing them in hot oil to bring out the flavor.

Tikkaspices

Vegetable stock

Onion, garlic and ginger make up the stock base for the dish. The onion should be chopped finely and placed on one side for later. The garlic should be crushed into a small bowl or glass, and then grate the ginger and add to the mixture along with a little water. I love the smell of the garlic and ginger together, I’ve found it works great for any dish that needs a rich vegetable stock.

Tikkaginger

Chicken

This is my least favorite part of the recipe since I hate handling raw chicken. One tip I’ve found is that you can leave the frozen chicken a little under-defrosted and the slicing will be a lot simpler.

Take the breasts and slice them into roughly 1 inch cubes. I’m pretty fussy and remove any fatty or stringy sections so there’s just the pure white meat. Sprinkle the salt and cayenne pepper over the cubes, adding more than suggested if you like it hot, and leave to marinate for a few minutes.

Tikkachicken

Cooking

Now all the components are ready, you’ll need to put the butter in a large, deep frying pan and put on a high heat. You need to get it hot, but not so hot that the butter separates or smokes. My usual test is to drop a single cumin seed into the oil. If it sizzles and pops within a few seconds, it’s hot enough.

Once up to the right temperature, add the plate of garam masala spices and stir for around a minute. You should start to smell the aroma of the spices as they mix with the oil.

Leaving the heat on high, add the onion and leave cooking for around 3 minutes, stirring frequently. The onion should be translucent and a little browned by the end.

Now add the cumin and coriander, along with the garlic/ginger mix. Mix well in the pan, and if it looks too dry add a little more water. Cook for another minute or so.

Add the canned tomatoes, mix again and leave for another minute.

Throw in the chicken and give another good stir. Now turn the heat down to low and place a cover on the pan. After a few minutes have a peek and it should be bubbling very gently. Keep stirring every few minutes, and it should be ready in around 45 minutes. If the sauce looks too watery, leave the lid off the pan for the 15 minutes to let it reduce. A few minutes before the end mix in the yogurt or sour cream.

Tikkapan

Rice

Curry-house rice is usually basmati, and cooked quite dry compared to the American standard. I recommend a cup of rice to two cups of water and a medium-sized saucepan with a good lid.

Take the rice and soak in a large bowl of water for half an hour before you cook it. Then drain the water, and add the rice to the two cups of boiling water in the pan. Stirring is the enemy, since you’ll break up the grains and add sticky starch to the mixture, so just stir once when you add the rice, and then once again when it’s up to boiling. Once it’s reached that point put on the lid, turn down the heat and don’t peek for ten minutes since the steam’s doing a lot of the cooking and you don’t want to let it escape.

After ten minutes turn off the heat and fluff the rice with a fork, and then cover again until everything else is ready. Put the rice on plates as a base and add the curry on top.

Naan

You really need proper stretchy sweet naan bread for the proper Indian experience. Making that yourself is a whole different article, but as a poor substitute you can try pitta bread in a pinch, since that’s a lot easier to find.

Is PageRank the ultimate implicit algorithm?

Bookpile

PageRank has to be one of the most successful algorithms ever. I’m wary of stretching the implicit web definition until it breaks, but it shares a lot of similarities with the algorithms we need to use.

Unintended information. It processes data for a radically different purpose than the content’s creators had in mind. Links were meant to simply be a way of referencing related material, nobody thought of them as indicators of authority. This is the definition of implicit data for me, it’s the information that you get from reading between the lines of the explicit content.
Completely automatic. No manual intervention means it can scale up to massive sets of data without having a corresponding increase in the numbers of users or employees you need. This means that its easy to be comprehensive, covering everything.
Hard to fake. When someone links to another page, they’re putting a small part of their reputation on the line. If the reader is disappointed in the destination, their opinion of the referrer drops, and this natural cost keeps the measure correlated with authority. This makes the measure very robust against manipulation.
Unreliable. PageRank is only a very crude measure of authority, and I’d imagine that a human-based system would come up with different rankings for a lot of sites.

As a contrast, consider the recipe behind a social site like Digg that aim to rank content in order of interest.

Explicit information. Every Digg vote is done in the knowledge that it will be used to rank stories on the site.
Human-driven. It relies completely on users rating the content.
Easy to fake. The voting itself is simple to game, so account creation and other measures are required to weed out bad players.
Reliable. The stories at the top of its rankings are generally ones a lot of people have found interesting, it seems good at avoiding boring content, though of course there’s plenty that doesn’t match my tastes.

A lot of work seems to be fixated on reliability, but this is short-sighted. Most implicit data algorithms can only ever produce a partial match between the output and the quality you’re trying to measure. Where they shine is their comprehensiveness and robustness. PageRank shows you can design your system around fuzzy reliability and reap the benefits of fully automatic and unfakeable measures.

A secret open-source MAPI example, from Microsoft!

Mapiscreen

Microsoft’s Messaging API has been the core interface to data held on their mail systems since the early 90’s. For Exchange 2007, it’s deprecated in favor of their new web service protocol but it’s still the language that Outlook speaks, and is the most comprehensive interface even for Exchange.

The underlying technology holding the mail data has changed massively over the years, and so the API has grown to be massive, inconsistent and obscure. It can’t be used with .Net, it requires C++ or a similar old-school language, and its behavior varies significantly between different versions of Outlook and Exchange. There’s some documentation and examples available, but what you really need is the source to a complete, battle-tested application. Surprisingly, that’s where a grassroots effort from Microsoft’s Stephen Griffin comes in!

He’s the author of MAPI Editor, an administrator tool for Exchange that lets you view the complete contents and properties of your mail store. It also offers a wealth of other features, like the ability to export individual messages or entire folders as XML. Even better, he’s made it a personal mission to keep the source available. I know how tricky getting that sort of approval can be in a large company and I’m very glad he succeeded, it’s been an invaluable reference for my work. I just wish it was given more prominence in the official Microsoft documentation, I had been working with the API for some time before I heard about it. That might be a reflection of it’s history, since it started off as a learning project, and evolved from being used as ad-hoc example code, to being documented in an official technical note, to shipping as part of the Exchange tools.

Another resource Stephen led me to is the MAPI mailing list. The archives are very useful, packed full of answers to both frequently and infrequently asked questions. It’s not often that you see an active technical mailing list that’s been going since 1994 either.

If implicit data’s so great, why did DirectHit die?

Tombstone

DirectHit was a technology that aimed to improve search results by promoting links that people both clicked on, and spent time looking through. These days we’d probably describe it as an attention data algorithm, which places it firmly in the implicit web universe. It was launched to great excitement in the late 90’s, but it never achieved its promise. There was some talk of it lingering on in Ask’s technology, but if so it’s a very minor and unpromoted part. If the implicit web is the wave of the future, why did DirectHit fail?

Feedback loops.
People will click on the top result three or four times more often than the second one. That means that even a minor difference in the original ranking system between the top result and the others will be massively exaggerated if you weight by clicks. This is a place where the click rate is driven by the external factor of result ranking, rather than the content quality that you’re hoping to rate. This is a systematic error that’s common whenever you present the user with an ordered list of choices. For example, I’d bet that people at the top of a list of Facebook friends in a drop-down menu are more likely to be chosen than those further down. Unless you randomize the order you show lists, which is pretty user-unfriendly, it’s hard to avoid this problem.

Click fraud. Anonymous user actions are easy to fake. There’s an underground industry devoted to clever ways of pretending to be a user clicking on an ad. The same technology (random IP addresses, spoofed user agents) could easily be be redirected to create faked attention data. In my mind, the only way to avoid this is to have some kind of trusted user identification associated with the attention data. That’s why Amazon’s recommendations are so hard to fake, you need to not only be logged in securely but spend money to influence them. It’s the same reason that Facebook are pushing so hard for their Beacon project, they’re able to generate attention data that’s linked to a verified person.

It’s a bad predictor of quality. Related to the feedback loop problem, whether someone clicks on a result link and how much time they spend there don’t have a strong enough relationship to whether the page is relevant. I’ll often spend a lot of time scrolling down through the many screens of ads on Expert Exchange on the off-chance they have something relevant (though at least they no longer serve up different results to Google). If I do that first and fail to get anything, and then immediately find the information I need on the next result link I click, should the time spent there be seen as a sign of quality, or just deliberately poor page design. This is something to keep in mind when evaluating attention data algorithms everywhere. You want to use unreliable data as an indicator and helper (eg in this case you could show a small bar next to results showing the attention score, rather than affecting the ranking), not as the primary controlling metric.

SEO Theory has an in-depth article on the state of click management that I’d recommend if you’re interested in more detail on the details of the fraud that when on when DirectHit was still alive.

An easy way to create your own search plugin for any site

Mycroft
AltSearchEngines recently explained how to find over a thousand plugins that add new engines to the search box in the top right of your browser. If the one you want isn’t there, or you need one for your own site, I’m going to show how you can create your own search plugin for Firefox. You don’t have to write any code, all you need is an example URL.

I recently installed Lijit on my blog, and I’d like to offer a search box plugin for searching on my site. The first hurdle is finding an example URL to base the plugin on. Lijit usually displays its results in an overlay, with no change in the address bar, but I spotted the permalink button that goes to a normal web page.

Ligitpermalink

For the search engine you’re using, do a search for a single term (in my case "camping"), and make a note of the full URL given for the result page. In the case of my Lijit blog search, the permalink version is

http://www.lijit.com/pvs/petewarden?q=camping&pvssearchtype=site&preserved_referer=http%3A%2F%2Fpetewarden.typepad.com

To start creating your plugin, go to the Mycroft Projects Search Plugin Generator. You’ll see a form with a series of fields to fill out. Luckily you will be able to ignore most of these, and I’ll show you what you need to do for the others. Once all the right information is in there, submitting the form will write the plugin code for you!

Ligiturl

The most crucial part of the form is the top "Query URL". This tells Firefox how to generate the right address for the search engine you’re using. The generator takes an example search engine URL, and works out how to build links that search for any keywords.

The generator needs to know where the search terms are supposed to be in the URLs for this engine, so in the Query box below I tell it the term I was looking for, "camping".

Mycroftscreen2

Below that, enter a URL for the home page of the search engine you’re using.

Ligitmain

Leave the CharSet entry as None, and leave the Categories section blank. The next section, Results, is tricky. Some obscure parts of Firefox want to extract the search result links using the information from here, but all we want to do is direct the user to the right page. We should be able to leave this blank, but unfortunately the generator fails. Instead, fill in the first four boxes with "Dummy Entry", so the generator has some entries to work with.

Mycroftscreen4

You can leave the remaining entries in the Results section blank. Moving down to the Plugin part, there are three final boxes you need to fill.

Mycroftscreen5

You should enter your name and email address in angle brackets, since you’re the author. The name is what appears in the drop-down menu for the search box, and the description should be a short explanation of what the plugin is for.

That’s all the information you need to enter, so hit the "Send" button at the bottom of the form. Mozilla then analyzes the information you’ve submitted, and tries to create the right code for your plugin. You should see a couple of new sections appear at the bottom of the page. The first box is the HTML that the engine returned for your example search, which isn’t that interesting. The crucial part is the lower section of text, titled Plugin Source.

Mycroftscreen6

This contains the actual code you need for your plugin. I’ve uploaded the example that the generator creates for searching this blog with Lijit here . To create your own file, cut and paste everything that’s in a typewriter font inside the light grey box into your favorite text editor like NotePad or TextEdit. Make sure you’re in plain text mode if it supports fonts or colors. Save the file as the name of your search engine, with the .src extension, for example petesearch.src.

Now you have two choices for how to install the plugin. If you just want to use it on your own machine, you can copy it to the directories described on this page. On Linux it’s /usr/lib/Mozilla/searchplugins , for OS X use /Applications/Mozilla.app/Contents/MacOS/Search Plugins/ and Windows is C:\Program Files\Mozilla.org\Mozilla\searchplugins\ .

If you want to put it on a website for other people to install, you’ll need a small section of JavaScript. Here’s a cut-down version that will install it when it’s clicked on.


<a href="http://petewarden.typepad.com/searchbrowser/files/petesearch.src" onclick="window.sidebar.addSearchEngine(this.href, ”, ‘PeteSearch’, ”); return false;">Install</a>

This will work fine on Firefox, but if you want to gracefully fail on other browsers you’ll need some more complex code to detect if the plugin format is supported. Here’s a page from Mozilla that explains what you’ll need to do. Alternatively you can just label the link as Firefox-only.

This guide shows how to create a Sherlock plugin which will work with all versions of Firefox. There’s also a new standard called OpenSearch which works with Firefox 2 and Internet Explorer. It has some nifty features like being able to add your plugin to the search box whenever a user is visiting a site, but no user-friendly generator.

Want to see a fresh approach to automated social analysis?

Hannes

I recently discovered Johannes Landstorfer’s blog after he linked to some of my articles. He’s a European researcher working on his thesis on "socially aware computers", exploring the new realms that are opened up once you have automated analysis of your social relationships based on your communications. There are some fascinating finds, like using phone bills to visualize your graph, or reflecting the uncertainty of the result of all our analysis by using deliberately vaguely posed avatars. His own work is intriguing too, he’s got a very visual approach to the field, which generates some interesting user-interface ideas. I’m looking forward to seeing more of what he’s up to.

A sea of Shooting Stars

Shootingstars

I’ve just got back from a day in the mountains with Liz. We were lucky enough to find a whole meadow full of Shooting Stars near the end of the Mishe Mokwa trail. It was a trail maintenance trip with the SMMTC, so we hiked four miles carrying tools onto the top of the Chamberlain Trail and then spent a few hours working before heading back. We found the flowers on the way back, and it was a real stroke of luck since Liz was already planning on profiling them on the trails council site. Here’s a sneak preview of one of her photos:

Shootingstarclose

They’re fascinating plants, they always remind me of a wasp with purple wings. The maintenance work gave me my drain-building fix. There’s nothing quite like playing in the dirt with a pick-ax to clear your mind. It’s so nice to be able to stand back after an hours work and see what you’ve accomplished. It started to rain towards the end, so I was even able to see them in action!

Drain

What can you learn from traditional indexing?

Book

I’m a firm believer in studying the techniques developed over centuries by librarians and other traditional information workers. One of the most misunderstood and underrated processes is indexing a book. Anybody who’s spent time trying to extract information from a reference book knows that a good index is crucial, but it’s not obvious the work that goes into creating one.

I’m very interested in that process, since a lot of my content analysis work, and search in general, can be looked at as trying to generate a useful index with no human intervention. That makes professional indexers views on automatic indexing software very relevant. Understandably they’re a little defensive, since most people don’t appreciate the skill it takes to create an index and being compared to software is never fun, but their critiques of automated analysis apply more generally to all automated keyword and search tools.

  • Flat. There’s no grouping of concepts into categories and subheadings.
  • Missing concepts. Only words that are mentioned in the text are included, there’s no reading between the lines to spot ideas that are implicit.
  • Lacking priorities. Software can’t tell which words are important, and which are incidental.
  • No anticipation. A good index focuses on the terms that a reader is likely to search for. Software has no way of telling this (though my work on extracting common search terms that lead to a page does provide some of this information).
  • Can’t link. Cross-referencing related ideas makes the navigation of an index much easier, but this requires semantic knowledge.
  • Duplication. Again, spotting which words are synonyms requires linguistic analysis, and isn’t handled well by software. This leads to confusing double entries for keywords.

It’s a wild, wild web

Viewfour
While browsing my visitor logs, I came across viewfour.com. It’s an interesting site, it does something similar to my old SearchMash Java applet and ManagedQ’s much more advanced engine, displaying live previews of search results. It does suffer from a problem with frame-busting sites unfortunately, for example this search for Pete Warden winds up with the toolfarm preview taking over the parent frame. That was a big reason why you either need some decent script-blocking code, or deploy it as a browser extension where you can prevent child frames from taking control.

I was curious to discover that there weren’t any organic reviews for the site that I could find, and the copyright was 2005. Most of the Google results pointed to download pages. It also includes a link to ViewSmart, a spyware/malware blocker, which seemed like an odd combination to go with a search engine. In fact, the only user-created review I found in the first few pages was this negative one from a spyware information site. I don’t recommend paying too much attention to anonymous posters, but if you do try out the search site, it would be prudent to avoid the additional download until I can find out more information about it. I’ll see if I can get more information directly from the author, SSHGuru.

How do you access Exchange server data?

Files

Like standards, the wonderful thing about Exchange APIs is that there’s so many to choose from. This page from Microsoft is designed to help you figure out which one you should use, and I count over 20 alternatives!

I need something that’s server based, not a client API, so that does help narrow down the selection a little. MAPI is a venerable interface, and still used by Outlook to communicate with the server, but unfortunately MS has dropped server-side support for it on Exchange 2007. It is possible to download an extension to enable it, but using a deprecated technology doesn’t feel like a long-term solution. CDOEx is another interface that’s been around for a while, and it’s designed for server code, but it too is deprecated.

Microsoft’s current recommendation is to switch all development to their new web service API. This looks intriguing, since it makes the physical location of the code that interfaces with the server irrelevant, but I’m wary that it will hit performance problems when accessing the large amounts of data that I typically work with. It seems mostly designed with clients in mind, and they typically have an incremental access pattern where they’re only touching small amounts of data at a time. Another issue is adoption of Exchange 2007. My anecdotal evidence is that many organizations are still running with older versions, and even Microsoft’s Small Business Server package still uses 2003. Since it’s likely that the old Exchange versions will be around for a while, that makes it tricky to rely on an interface that’s only supported in the very latest update.