An online guide to hiking and biking trails near Los Angeles

October 20, 2008 By Pete Warden in Uncategorized Leave a comment

Photo by KnaPix

I was out doing trail work at the annual COSCA event on Saturday, and met up with another regular, Steve Clark. He runs a website that's a fantastic resource for anyone who likes to get out in the mountains west of LA, venturacountytrails.org. I've put up my own guides to a few of the local trails and campgrounds, but Steve's assembled a comprehensive set of maps and descriptions covering most routes in the area.

He's got GPS-based topo maps for all the places he covers, together with elevation charts and descriptions. I still always recommend having a good traditional map of any area you're exploring, but he's done a fantastic job of writing guides that make it very hard to get lost! Here's links to his coverage of some of my favorite areas:

Mount Pinos
Circle X
Rocky Peak
Wildwood
Cheeseboro

Great job Steve, you're really helping people discover the wonderful wilderness we've got on our doorstep.

How to access all that lovely Exchange data

October 19, 2008 By Pete Warden in Uncategorized Leave a comment

Photo by Photobunny

Any company's Exchange server holds a wealth of information that could be used to build some very compelling tools, if only it was easily accessible. I've made it my mission to crack that silo and build some useful services, which has meant researching every way I can find of drilling in. Here's the approaches I've implemented or looked at:

Local MAPI

Microsoft has reused the acronym for Messaging API in multiple different projects. This version refers to the interface for accessing mail data held on the same machine. In use since 1992(!), it's supported by all configurations of Exchange, but it's deprecated and requires an optional component in 2007.

It offers fast access to all tasks, contacts, emails and calendars for every user. The biggest obstacle is that it requires server software installation, and most Exchange admins are deathly afraid of touching a working configuration. There's also other same-computer access APIs like ExOleDB that work very similarly. Mailana implements a server-side MAPI downloader as one of its entry points.

Exchange Web Services

EWS is the officially recommended replacement to local MAPI, but unfortunately it's only supported by Exchange 2007. Like MAPI, you can see all users' information as long as you have correctly setup the admin access account. It's a network interface, so there's no scary installation required. I've not implemented this yet.

MAPI/RPC

The other protocol named MAPI, this one is the network interface that Outlook uses to communicate with the Exchange server. It's not officially supported as an API, but third-parties like OpenMAPI have reverse-engineered it. The great promise of this approach is that you could implement a proxy server that sits in between Exchange and its Outlook clients, picking out all the information you want as it flows across the network with almost no change to the existing infrastructure. The downside is that you only see information as its sent back and forth, which means no access to historical data. You can see all the types of information supported by Outlook. This isn't implemented in Mailana, but it's definitely very attractive and I hope to work on it soon.

IMAP

IMAP is the standard internet protocal for sending mail between servers and clients. Many corporate networks have IMAP journaling enabled for Exchange to allow back-up services to pull down mail items for their archives. It gives you access all users' emails, but you can't see other items like contacts or meetings. This requires the admin to setup journaling on the server, but that's pretty low-impact. I've got IMAP support in Mailana, which as a bonus also lets me import Gmail accounts.

Outlook Add-in

From a client-side plugin to Outlook, all of a user's information is readable through the Outlook Object Model. You only have access to information that's been kept on the user's local machine. The really nasty bit is that this approach requires installation on every client machine connected to the server, and you have to jump through some hoops on older Outlook versions to avoid annoying security warnings. I've got this implemented in Mailana.

ActiveSync

Like MAPI/RPC, ActiveSync is designed to allow the transfer of tasks, emails and contacts between a device and the server. This means you only have access to a single user's information. It doesn't require any installation, and is enabled and supported on most servers. I've not implemented this.

How to implement video capture on the mac

October 18, 2008 By Pete Warden in Uncategorized Leave a comment

Photo by pt

If you want to capture the data from an iSight (or any other video camera) on OS X, figuring out where to start is tough. Video input and output are both controlled by QuickTime, an amazingly successful framework, but as a long-lived interface to rapidly changing hardware it has accumulated an inpenetrable thicket of APIs. That means there's no obvious StartVideoCapture() function, instead you have to use some odd legacy calls.

Here's the official SGDataProcSample code demonstrating video capture to a data buffer. The actual source you want is in MiniMung.c, appropriately named after the joke acronym Mung Until No Good. Don't try to make too much sense of the actual functions, just accept that these are the magical incantations you need to mutter to get it working.

If you want an example of uploading the captured data to an OpenGL texture, you can download the source code to my Live Feed video plugin for Motion and look in LiveFeed.mm for the code. I have a separate thread running capturing the video and downloading it to a buffer, while the rendering thread constantly uploads a texture from it. There's a risk of tearing, but keeps the logic simple and doesn't require any blocking.

An engineer’s guide to demos

October 17, 2008 By Pete Warden in Uncategorized Leave a comment

Photo by RazZiel

I met up with some friends last night and did an off-the-cuff show and tell. I left feeling I'd failed to get across what's so interesting about Mailana, reminding me that in my natural state I give terrible demos. Since they're a crucial part of selling an idea, I've had to work hard to fix that. I know I share that affliction with almost every engineer I know, so here's some tips that have helped me.

Accept that it's important

In most engineering situations, if I know something interesting and you don't, you're expected to make an effort to learn it. That's completely reversed when you're trying to sell your idea. You may be certain that it's the best thing since sliced bread, but you're the one who has to make the effort to communicate that to investors, customers or journalists. They have massive numbers of people trying to persuade them to take action, so they can only spend a small amount of time and thought on each proposal. That means you have to spend a lot of time and effort crafting your demo.

Rehearse relentlessly

Stop coding at least a couple of days before, turn on your web cam, start recording and practice what you're going to say. Watch it back every time, and then do it again. My rule of thumb is that I need to do it at least 25 times before I start sounding natural, ironically. This also comes in handy if you want to produce a web video of your presentation, the one I'm still proudest of is my pitch for SearchMash from a few years back. It sounded crazy to me at first to spend so much time on it, f you don't believe me, just read about the days of prep Steve Jobs puts in for his keynotes.

Show, don't tell

Jason Calcanis's demo guide is spot on. People don't want to hear about your life story, just show them your product within the first 30 seconds, preferably doing something awesome. My Achilles heel is going into all of the really interesting technical details of how it works. That's like having a car commercial with the hood popped just showing them the engine. They want to see what it does for them, not how it does it.

And if you want to know how this all works out in practice, come along to Defrag to see me in action!

Where can you find all the Javascript answers?

October 16, 2008 By Pete Warden in Uncategorized Leave a comment

Photo by _tomanthony

Javascript is the unsung hero of the last few years. Originally designed as a lightweight scripting language, it's taken the weight of building complex browser applications onto its shoulders. There are definitely occasional creaks under the strain though, and if you're hitting problems, the first place to look is Quirksmode.

It's essentially a collection of Peter-Paul Koch's notes from his own work using JS for web development, but the breadth and depth of the coverage is amazing. Whether you need to restyle file upload buttons, communicate between windows, detecting keystrokes or discover which events work in different browsers, you'll find the definitive answer in a clear, well-written page.

If you're a Javascript developer, bookmark Quirksmode and buy his book. If you're not, hire him if you need any JS work. I want to make sure he keeps saving me vast amounts of debugging!

An XML Format for Email

October 15, 2008 By Pete Warden in Uncategorized 1 Comment

Photo by Photobunny

Breaking down information silos is the key to making better tools. Email stores are the biggest and most interesting silos out there, and one reason for the lack of progress is the lack of interchange standards between mail systems. Sure there's IMAP/POP, and RFCs galore, but they're all either connection oriented transport protocols, or are hard to decode with modern tools like MIME. For my own work I'm taking mails from diverse sources like Gmail through IMAP, Outlook through OOM and Exchange through MAPI and converting them into XML so that I can write the rest of my pipeline once and ignore where the mails came from.

Seeing Tim O'Reilly asking Postbox about their XML use reminded me that an agreed standard for email in XML would help everyone. XMTP is an effort based on RFCs, but a simple duplication of headers into XML tags is not much different than parsing the original raw text. What I needed was something that had a layered approach, hiding details like the exact type of a recipient to allow easy dumping of everybody who received it, rather than having to separately collate the to, cc and bcc headers. And nobody should ever have to deal with MIME's multi-part implementation ever again.

Here's some information on my format, with a DTD and an example encoded message. It's aimed at my need to pass around messages within a data analysis pipeline, so it skips a lot of less-used headers, but it captures what I need. I'll put together a minimal expat-based PHP parser in the future. Contact me if you're using any other email XML formats, I want to understand what else is out there.

In style I've completely avoided attributes, putting everything within the data section of a tag. This makes parsing simpler, and also brings it closer to JSON style notation for easy data interchange using map arrays in languages like PHP.

Download message.dtd

Download examplemessage.xml

The example message demonstrates the tag, containing a plain text and HTML body, along with a single image attachment. Here's an explanation of the tag types:

<messagelist> This surrounds an unordered list of <message> objects

<message> Contains all the data for a message

<messageuid> A globally unique ID for the message (eg a UUID)

<sourceuid> Some ID that uniquely identifies the message at the location where it originated (eg an EntryID in Outlook). This is different from the <messageuid> because different copies of the same message may be present in the pipeline.

<subject> The subject line of the email

<fromaddress> The email address of the sender

<fromdisplay> The display name of the sender

<deliverytime> The time of arrival for the message in the recipients inbox. Stored in Y-m-d H:i:s format (will need time-zone added, but currently assuming GMT).

<recipients> Surrounds an unordered list of <recipient> objects

<recipient> Contains information about an individual recipient

<address> The email address for a recipient

<display> The display name for a recipient

<role> The type of recipient, either 'to', 'cc' or 'bcc'

<contenttext> The plain text version of the message body or an attachment. My tools take .doc, .pdf, and .xls attachments and convert them into both text and HTML versions for easy searching, analysis and viewing.

<contenthtml> The HTML version of the message body or attachment.

<sourcefolder> Somewhat misnamed, this actually indicates whether the mail was 'sent' or 'received

<attachments> Surrounds an unordered list of <attachment> objects

<attachment> Begins an individual attachment

<attachmentuid> A globally unique identifier to refer to the attachment

<filename> The full filename of the attachment

<filetype> The MIME type of the attached file

<filedata64> The actual data for the attachment, base64 encoded into a text form

The plight of abandoned mascots

October 14, 2008 By Pete Warden in Uncategorized Leave a comment

I enjoy Jonathan Salem Baskin's vigorous assaults on the cargo-cult of 'branding' over at Dim Bulb, and in his new book Branding Only Works on Cattle, so I got a kick out of this musical number he put together. Reminiscent of Southpark's Island of Misfit Mascots, he's drafted the Pets.com sock puppet to lament the harsh realities of life after the commercials stop airing. I could definitely handle more business books if they were in musical form.

Make the most of your email with Postbox

October 14, 2008 By Pete Warden in Uncategorized Leave a comment

I recently heard from Sherman Dickman at Postbox. They're building a very interesting mail client, implementing a lot of the tools I think will be essential for working more effectively with your email. Their focus on tagging, search and organization is spot-on, the web has raised the bar for interacting with large data sets. Why can you search Google in 0.02 seconds, but your mail can take minutes? They have some strong tools for quickly previewing all the content in attachments too, another big opportunity that conventional clients are missing.

They're implementing a Mac client initially, which might be a smart move considering their main competition in the professional market is the sadly-neglected Entourage. It's not released yet, but you can learn a bit more from this TechCrunch coverage.

I'm really pleased to see them moving forward with some innovative solutions, and look forward to downloading it once there's a version available.

Automatically tagging using Wikipedia

October 13, 2008 By Pete Warden in Uncategorized Leave a comment

Here’s my new tag cloud generator that uses a list of all the Wikipedia article titles to produce a visualization of the concepts on a web page. You can download the source PHP code here, or enter a URL in the box below to get a cloud:

It’s an extension of the standard tag cloud technique of counting word frequencies. I’ve included a white list of all the Wikipedia article names, as an approximation of ‘interesting concepts’. Only phrases that appear amongst the million titles are included in the cloud. I’ve weeded out the top 10,000 most commonly used words to reduce the noise. An extension would be using the expected average frequency of a word versus its actual frequency to produce statistically improbable phrases like Amazon.

This is a by-product of some of my email analysis work. Tag clouds just based on the number of times a word appears in a piece of text often generate surprisingly good summaries. People tolerate the noise of incorrect words in a way they wouldn’t with a bullet-point list.

The underlying technology of semantic analysis is making very slow progress, so I’m picking applications and interfaces that are extremely tolerant of bad input, where the broad coverage you get from automating the analysis wins out over its poor quality.

One example of this is creating a profile for someone based on the contents of the emails they send. In a large company you’d have a white-list of skill and project keywords, similar to the Wikipedia titles. The people who mention those words most often in their emails would have them added to their expertise list in a searchable employee directory. The consequences of some incorrect entries aren’t too painful. As long as there’s a white list, no private or embarrassing terms will appear there, and the profile can be hand-edited by the user to fix anything glaringly wrong.

DBAs Gone Wild!