Easy user authentication for Windows with PHP

Password
Photo by Richard Parmiter

The internet is slowly groping towards a single user identity system through the OpenID initiative, but one of the nice things about working inside a corporate firewall is that there’s already a directory of user names and passwords. In the dominant Microsoft world, you rely on Active Directory to keep track of all that information. The ‘Active’ prefix usually strikes fear into anyone integrating non-MS technology into a Windows world, since it often translates to ‘proprietary’, but they’ve actually done a really good job of making the directory information available through the LDAP open standard.

If you want to try converting your PHP-based internet app to intranet authentication, check out this tutorial on using LDAP from PHP with an Exchange server. If you’re interested in the details of using LDAP with PHP in general, things like how to install the LDAP module if it isn’t there by default on your PHP installation, check out this two-part guide.

Get a real-time view of the Apache error log

Tail
Photo by CR

I spend a lot of time developing on a remote server through an SSH connection, and I’ve found it tough to keep an eye on the error log file. Typically I’ve been running the tail unix command to look at the last 10 lines, but this only gives you a snapshot of the errors at that instant. If I wanted to see more, I had to run tail again. I knew there had to be a better way, something I was missing since I couldn’t imagine unix wizards putting up with this. Luckily I was right. If you pass the -f or –follow option to tail it will continuously update, so you can see errors in real-time as the lines are written to the log file. This is perfect, I can see at a glance what’s going on.

To do the same, just open an SSH session to your remote server, and then type in the following command:

tail -f /var/log/httpd/error_log

The log file location varies on different flavors of Linux, and if you have access problems, make sure the logged-in user has high enough permissions to see it.

How to find new ways to visualize information

Semafore

It’s hard to imagine a more codified visualization than a calendar. You have a grid of cells, with each row representing a week, and that’s pretty much it. That’s what my good friend Kent Oberheu set out to change. Over the past 5 years he’s produced 60 monthly calendars that show time contorting in strange paths, all embedded in his art. The regularity of a traditional calendar is lost, instead you really see the flow of time. It helps me remember that every day is unique, just like his visualizations.

I don’t think these are going to replace standard calendars, but it’s a demonstration that looking at even the most mundane information in unusual ways can start you thinking in new directions. Kent’s better known as Semafore in the design world, and after years as a designer with Apple he’s just moved to a new position at Industrial Light and Magic. All the time I’ve known him he’s been pushing into new visual territory, finding magic combinations that tickle your aesthetic nerve and manage to get something across at the same time. He’s the first person I turn to when I need to talk about designing visualizations, and the combination of my engineering skills and his design direction has worked very well.

As I’ve gone deeper into the subterranean world of information that’s held on companies’ Exchange servers, it’s obvious that part of the toolkit for navigating has to be new ways of looking at that data. Some of that can be borrowed from the recent explosion of web visualizations, such as animated tag clouds for your email content, but some of the challenges are unique. How can you do a decent view of different versions of your attachments over time for example?

That’s where the Information Aesthetics blog comes in. It was the first place Kent sent me when I started to ask him for inspiration, and it’s full of the latest and most beautiful visualizations. Some of my recent finds from there include wordle, a Java applet that lets you create very good-looking word clouds, the Information Design Patterns Cookbook and the Mount Fear physical sculptures representing London crime statistics. I don’t know if I’m going to cut up cardboard to build an art piece from the Enron emails, but all of this starts trains of thought on how to adapt these innovations to my problem space. If you’re looking for inspiration too, I’d highly recommend subscribing.

LA’s guilty pleasure

Tacocrash
Photo by mxlanderos

If there’s one thing that unites Angelenos, it’s a fascination with car chases. The main news shows will completely shut down their regular coverage for the whole hour if there’s a live chase happening, no matter how uneventful it is. The anchors turn into sports commentators, with lots of informed speculation about the exact tactics police will use, when the PIT maneuver is safe, if the CHP or sheriffs have jurisdiction at every point. KTLA even has a helicopter pilot with the perfect name of Johnny McCool to cover it all.

Liz asked me last night, "Is that morally correct?" and the answer has to be "No, but I still can’t look away". It’s glorifying criminals who are putting a lot of innocent people in danger for the sake of entertainment, and feels like a disorganized version of The Running Man.

Still, to LA residents who spends a significant portion of their life at a frustrated standstill in traffic, the sight of someone breaking free and using every trick to speed along the freeways is mesmerizing and vicariously liberating. The fact that they almost always get caught at the end provides a moral alibi, but the real payoff is seeing them in flight.

I had dinner last night with a friend who’s been collecting the most gripping examples on his blog, and we talked a lot about this local obsession. This New Yorker article is the best exploration I’ve seen, with Sheriff Baca blaming the large number of chases on a shortage of cops and lots of "highly mobile idiots", but it never manages to really explain their popularity. It looks like the number of local car chases is actually declining since the peak in 2004, but I’m betting LA stays way ahead of the rest of the country for a long time to come.

How to easily search and replace with sed

Textjumble
Photo by wai:ti

If you’ve used any flavor of unix for programming, you’re probably familiar with grep, the tool for locating patterns in text files. That’s great if you just want to search for a string, but what if you want to replace it?

Sed, the stream editor, is the answer, but that also brings up a new question: how on earth do I use it? It’s probably one of the most obscure interfaces ever invented, its syntax makes obfuscated perl look like a model of clarity. Usually with a new tool I start off looking at a series of examples, like these sed one-liners, to get a rough mental model of how it works, and then dive into the documentation on specific points. That didn’t work with sed, I was still baffled even after checking those out. The man page didn’t help, I could read the words but they didn’t make any sense.

Finally I came across my salvation, Bruce Barnett’s introduction and tutorial for sed. He hooked me with his first section on The Awful Truth about sed, with its reassurance that it’s not my fault that I’m struggling to make head or tail of anything. He then goes through all the capabilities of sed in the order he learnt them. It’s a massive list, but even if you only get a few pages in you’ll know how to do a simple search and replace. Sed is a very powerful tool, it’s worth persevering with the rest so you can discover some of the advanced options, like replacing only between certain tags in a file (eg how to change the content text of a particular XML tag) and working from line numbers. Bruce is an entertaining companion for your journey too, he has great fun with some asides demonstrating how to make the syntax even harder to read, just for kicks.

Why massive datasets beat clever algorithms

Library
Photo by H Wren

Jeremy Liew recently posted some hints, tips and cheats to better datamining. The main thrust, based on Anand Rajaraman’s class at Stanford, is that finding more data is a surer way to improve your results than tweaking the algorithm. This matches both my own experience trying to do datamining, and what I’ve seen with other company’s technologies. Engineers have a bias towards making algorithms more complex, because that’s the direction that gets the most respect from your peers and offers the most intellectual challenge.

That makes us blind to the advantages of what the Jargon File calls wall-followers, after the Harvey Wallbanger robot that simply kept one hand on a wall at all times to complete a maze, and gave far better results than the sophisticated competitors using complex route-finding. Google’s PageRank is a great example, almost zen in its simplicity, with no semantic knowledge at all.

One hidden advantage to this simple-mindedness is very predictable behavior, since the simplicity means there’s a lot fewer variables that affect the outcome. This is why machine-learning is so scary a change for Google, there’s no guarantee that some untested combination of inputs won’t result in very broken results.

Another obvious benefit is speed of both development and processing. This lets you get up and running very fast, and get through a lot of data. This gives you a lot more coverage. Yahoo’s directory wasn’t beaten because Google ranked pages more accurately than humans, but because Yahoo could only cover a tiny fraction of what was out there.

On the face of it, this doesn’t sound good for a lot of the alternative search engines that are competing out there. If it’s hard to beat a simple ranking algorithm, should they all pack up and go home? No, I think there’s an immense amount that can be improved both on the presentation side and by gathering novel data sets. For example why can’t I pull a personal page rank based on my friends and friends-of-friends preferences for sites? What about looking at my and their clickstreams to get an idea of those preferences?

Getting people to listen

Listen
Photo by Paulgi

I’ve often sat spellbound listening to a great speaker like Steve Jobs or Al Gore and wondered what makes it different from the majority of talks I have trouble paying attention to. Some of it is their passion bubbling up, part of it is sheer practice that ironically lets them relax and sound natural (I never seem as impromptu as when I’ve rehearsed a talk 25 times). What I didn’t understand until I saw these videos by Ira Glass is that they’re using classic story-telling techniques too, mental hacks that grab the audience’s attention.

There’s lots of good advice in there, I recommend checking them all out, but the most valuable for me was the anecdote/reflection structure. We’re always taught to lay out our thoughts with the "Say what you’re going to say, say it, and then say what you said" model, where you present your argument’s conclusion, then back that up with facts, and then revisit the conclusion. This is a great way of presenting a mathematical proof, but a terrible way of engaging the interest of a human being. We’re all wired to love stories, and the basic structure of a story is a series of connected events that raise some questions, which are then answered by the conclusion.

The terms Ira uses are anecdote for describing a sequence of things happening, and then a bit of reflection afterwards that tells the audience why those events were worth describing, answering the questions that they implicitly pose. The example he uses goes something like this:

"The man woke up, and it was silent. He got out of bed and looked around, and there was still no noise. He walked downstairs, and the house was completely quiet."

One the face of it, it’s the most boring set of facts imaginable, but your mind expects you’re being told this for some reason, and anticipates the question being answered. Is it silent because the man has gone deaf? The world has ended? This keeps the audience listening for clues, and gives them a payoff when you explain the significance of the details you told them at the end, during the reflection. After hearing this explained, I realized that both Steve’s keynotes and Al’s presentation use this structure masterfully. They build up questions with an anecdote, and then tell you what the conclusion was.

We all end up having to persuade others to take action, and the first step is actually getting them to listen. Give this a try with your own speaking and writing, I’ve been surprised by how much it’s helped me.