How to set up Linux for web development through Parallels on OS X

Redhat
Photo by CarlosLuis

I wanted a setup for web development that matched my production server, but let me do local development on my MacBook Pro. I’m a big fan of Parallels running Windows, so I set out to get Red Hat Fedora server running too. My requirements were that I should easily be able to install extensions that aren’t standard on OS X PHP like IMAP and GD, and that I could save files to my local drive and immediately run them through the server without having to copy anything.

Getting started with Fedora was painless thanks to this ready-made Parallels disk image. I downloaded that and it loaded immediately with no setup required. I then ran yum to update the system to the latest patches, and I was in business. I had to remove yum-updatesd before I could open any of the desktop software installer applications, but once that was working I could run the Add Software application. There I chose the parts I needed, like mysql PHP and other random web development additions.

After that was going, I created a test.php file containing <?php phpinfo(); ?> and placed it in /var/www/html/. Loading up Firefox inside Fedora and pointing it to http://localhost/test.php gave me the expected information dump. Everything was going smoothly, so I should have known there was trouble in store.

The only remaining part was adding a link back to my OS X filesystem within Linux, so that Apache could access my files without having to do any copying. Parallels offers a great bridge between the Windows and mac file stores, but I couldn’t find anything that easy for Fedora. What I did run across was sshfs, which uses fuse and ssh to create a virtual folder inside Linux that points back to a directory on a system accessed through the network.

I went through all the steps to get that set up, but spent a very long time getting 403 Permission Denied errors every time I tried to access OS X files through Apache. After a lot of hair-pulling, I figured out how to make everything play nicely together. It involves loosening the permissions model, so I don’t recommend doing this on a production server for security reasons, but it should be fine for local development. Here’s the steps:

  • On Linux, make sure you have sshfs, and you’ve added your current Linux user to the fuse group with su -c ‘gpasswd -a <username> fuse’
  • Again on Fedora, get fuse running with service fuse start and add it to the startup sequence with echo ‘service fuse start’ >> /etc/rc.local
  • On OS X, go to Preferences->Sharing and turn on remote login. Make a note of the IP number it displays on that window.
  • To test the remote login, in a Linux terminal window do ssh <mac user name>@<mac IP number>, eg ssh petewarden@10.0.1.196 . If this doesn’t work, you’ll need to stop and check the IP number and your Parallels network setup.
  • Now you can try to set up the filesystem connection through SSH. On your Fedora terminal, type sshfs -o allow_other,default_permissions <mac user name>@<mac IP number>:<Path to your mac folder> /var/www/html/testbed , eg sshfs -o allow_other,default_permissions petewarden@10.0.1.196:/Users/petewarden/Sites/testbed /var/www/html/testbed . The magic bit here are the extra options for allowing other users to access the folder. Without these, the apache user won’t be able to read the files and you’ll get the 403 errors. Be warned though, this is necessary to fix them, but you’ll need to follow the steps below to remove all the permission problems.
  • Because the files show up with a user ID and group ID that’s unknown to the Linux filesystem, Apache’s strict security settings will refuse to display them. To get around this, you need to make the security settings less restrictive. First you’ll need to disable Security Enhanced Linux, aka SELinux. To do this through the GUI, go to the Security preferences and click the SELinux tab. Then choose "Disabled" from the dropdown menu. You could also do this on a per-program basis, but I wanted to keep it simple.
  • Next you need to remove suEXEC, another security feature of Apache. To do this just move the file itself, on my system at /usr/sbin/suexec to another location eg mv /usr/sbin/suexec /usr/bin/suexec_disabled
  • Finally restart apache with service restart httpd and try navigating to one of your pages. With any luck you’ll now be able to save from OS X and immediately see them in Firefox within Fedora.

One of my main reasons for this setup was to easily install extensions. After going through these steps I was able to just run yum install php-gd to get the gd graphics library, a project that had previously taken me hours of fiddling on OS X even with fink.

Update: I’m now using the Parallels Linux instance directly from OS X. To do that I had to enable HTTP in the firewall settings on Fedora, and then ran ifconfig to work out the IP address that it had acquired. After that I can navigate to http://x.x.x.x/ in my OS X copy of Firefox and access my files, without having to ever switch to the Linux desktop.

Kinesis Keyboards – I wish I could quit you

Kinesis
I’m stuck in an abusive relationship with my keyboard. The Kinesis model I use is an RSI sufferers dream, and I’ve been using one for the last 4 years. Unfortunately, I’ve just had to order my fourth, since they’re built to shop class project standards.

The design is perfect, with an amazing amount of programmability and a QWERTY layout where you never need to stretch or strain your fingers from their home positions. There’s even a programmable foot switch, which I have mapped to shift, control and command, since pressing my little fingers for the modifiers seemed to be at the root of a lot of my pain.

They’re pricey, starting at around $300, but look and feel cheap. The silver paint rubs off pretty easily, the cut plastic is uneven and they use a telephone connector to join up the pedal with the main unit. More seriously the lack of quality shows up in the number of times they go wrong. I’ve had to open up and fiddle with all of my keyboards and footswitches. They’re surprisingly low-tech inside, with a small PCB and chips for the keyboard and old-school analog sensors for the foot switch. Because of this I’ve sometimes had luck when the problem turned out to be a loose connector or dirty contacts, but I’ve never had one last more than 18 months before it’s declared dead. Before you blame the victim, I’m pretty careful with my equipment, so I don’t think I’m putting them through anything unusual. They have a 2 year warranty, but I’ve not wanted to lose my keyboard
for the time that would take, and have usually hacked mine so much by
the time I give up that I don’t feel like I could return it.

I still recommend them to anyone who wants a hand-friendly keyboard, Liz now has two, but go into the relationship with your eyes open. I’d love to hear alternative suggestions too from anyone who’s found something better.

Off the grid on Santa Cruz Island

Anacapa
Photo by Kevin Sarayba

Tomorrow morning I’m off for a four day camping trip to Santa Cruz Island, where we’ll be working with the NPS rangers to fix up some of the hiking trails. It’s like a trip back to the 19th century, with no phones, cars, planes, and no permanent inhabitants on a 100 square mile island. I can’t imagine any other way of escaping from my compulsion to check my iphone and RSS reader, and it’s one of the most beautiful places on earth to boot. All that and it’s just an hour’s boat ride from LA!

To keep you busy while I’m away, I recommend checking out the Bombay TV video mashup site. It’s very simple, just placing subtitles on some old bollywood movies, but the clips are perfectly chosen. I guarantee that you’ll wake up everyone in the room if you use this for your next presentation.

Why aren’t we using humans as robots?

Robot
Photo by Regolare

Yesterday I had lunch with Stan James of Lijit fame, and it was a blast. One of the topics that’s fascinated both of us is breaking down the walls that companies put up around your data. In the 90’s it was undocumented file formats and this decade it’s EULAs on web services like Facebook. The intent is to keep your data locked in to a service, so that you’ll remain a customer, but what’s interesting is that they don’t have any legal way of enforcing exactly that. Instead they forbid processing the data with automated scripts and giving out your account information to third-party services. It’s pretty simple to detect when somebody’s using a robot to walk your site, and so this is easy to enforce.

The approach I took with Google Hot Keys was to rely on users themselves to visit sites and view pages. I was then able to analyze and extract semantic information on the client side, as a post processing step using a browser extension. It would be pretty straightforward to do the same thing on Facebook, sucking down your friends information every time you visited their profile. I Am Not A Lawyer, but this sort of approach is both impossible to detect from the server side and seems hard to EULA out of existence. You’re inherently running an automated script on the pages you receive just to display them, unless you only read the raw HTTP/HTML responses.

So why isn’t this approach more popular? One thing both me and Stan agreed on is that getting browser plugins distributed is really, really hard. Some days the majority of Google’s site ads seem to be for their very useful toolbar, but based on my experience only a tiny fraction of users have it installed. If Google’s marketing machine can’t persuade people to install client software, it’s obvious you need a very compelling proposition before you can get a lot of uptake.

Illegal characters in PHP XML parsing

Kanji
Photo by Cattoo

If you hit the error "Invalid character" while using PHP’s built-in XML parser, and you don’t see the usual "<" or "&" characters in the input, you might be running into the same control code problems I’ve been hitting. I’d always assumed, and most sites state, that you can put anything within a CDATA block apart from < and &. I’m wrapping the bodies of email messages in XML, within CDATA’s, but I was still seeing parser failures like these. I also tried using various escaping methods instead, like htmlspecialchars(), but still hit the failure.

Digging into it was tricky, since it doesn’t give you the actual character value it’s choking on. In one case I tracked it down to "\x99", which looks like a Microsoft variant for the trademark character. That got me wondering exactly what character set was being used, so I tried specifying ISO 8859 1 explicitly when I created the parser, but still hit the same error.

Then I realized I was cutting some corners by skipping the starting <?xml> tag for all of the strings I was creating. That’s where you can specify the character set for the file, and sure enough prefixing it with
<?xml version="1.0" encoding="ISO-8859-1"?>
got me past that first error. I thought I was home free, but looking at my test logs, it looks like it failed again overnight after going through 1300 more emails. I shall have to dig into that further and see what the issue was there.

It does seem like a design flaw that the parser chokes dies on unrecognized characters, rather than shrugging its shoulders and carrying on. It may well be outside of the spec to have control characters that aren’t legal in the current instruction set, but it seems both possible and helpful to have a mode that either ignores or demotes those characters when they’re found, rather than throwing up its hands and refusing to parse any further. It has the same smell of enforcing elegance at the expense of utility that infuriated me with bondage and discipline languages like Pascal.

You need pictures

Rabidpoodles
Photo by The Pack

I’m a very visual person, and I love plastering photos over anything I can. Jud mentioned he got a kick out of some of them on here, so I’d better confess and acknowledge my sources. Thanks to the internet and a wonderful community of artists, you can spice up your own documents, presentations and blog posts with some stunning pictures, all for no money down and zero monthly payments.

Flickr users have made a lot of beautiful photos available through the Creative Commons license. If you do a search like this:
http://flickr.com/search/?q=the&s=int&l=3
you’ll get around 3 million CC attribution/non-derivative/non-commercial licensed pictures that contain "the" in their description, sorted most-interesting first. Alter the search term if you want to explore something more specific. Make sure that you include proper attribution for the photo if you do use it, and respect the licensing. Be careful though, sometimes I end up spending more time browsing for photos than actually writing the post!

Once you’ve got one you like, my preferred way of getting them for a blog post is to screen-grab from the thumbnail shown on the main page for the photo. This is about the right size, has been downsampled well, and lets me do any cropping I want to do, all very quickly without ever having to load up a photo editing program. On the Mac you press Command-Shift-4 to bring up the cross-hairs, and then the result is saved as Picture X.png on your desktop. On Vista, load up the "Snipping Tool" from accessories and choose "New" to do pretty much the same thing.

What’s that plant?

Lizplanthunting
Photo of Liz by Kim Kelly

Pearly Everlasting. Liveforever. Manzanita. Shooting Stars. I love the names, but I’m hopeless at identifying local plants. Luckily I hang out with people a lot smarter than me. Liz has been writing Plant of the Month on the trails council site for the last couple of years, and she’s accumulated an amazing amount of knowledge of the local flowers. Probably the best way to learn more yourself is to go on an organized hike with the a group like the Conejo Sierra Club, or join one of our Saturday trail maintenance days. There’s usually at least one old hand who will happily tell you the story behind any of the plants.

If you want some books to take on the trail I highly recommend buying Milt McAuley’s Wildflowers of the Santa Monica Mountains. He’s a local legend who started building trails here in the 40’s, and who was still leading trail work a few years ago when I first arrived. The other book to turn to is Nancy Dale’s Flowering Plants: The Santa Monica Mountains. She covers a lot of detail and it’s a lot easier identifying something with an alternative source.

They haven’t figured out how to get internet access at the bottom of the canyons yet, but for when you’re back home Tony Valois has put together a very clear identification guide. He’s done a great job with the navigation, letting you look through his collection of photos by appearance, common names and scientific names.

With the recent sprinkling of rain, you’ll be able to see the best display for years, so don’t delay getting out there!

Lizandthor

Don’t repeat yourself with XML and SQL

Rascallyrepeatingrabbits
Photo by TW Collins

One key principle of Agile development is Don’t Repeat Yourself. If you’ve got one piece of data, make sure it’s only defined in one place in your code. That way it’s easy to change without either having to remember everywhere else you need to modify, or introducing bugs because your software’s inconsistent.

This gets really hard when you’re dealing with data flowing back and forth between XML and SQL. There’s a fundamental mismatch between a relational database that wants to store its information in columns and rows, and the tree structure of an XML document. Stylus do a great job describing the technical details of why XML is from Venus and SQL is from Mars, but the upshot is that it’s hard to find a common language that you can use to describe the data in both. A simple example is a list of the recipients of a particular email. The natural XML idiom would be something like this:

<message>
<snipped the other data>
<recipients>
<email>bob@bob.com</email>
<email>sue@sue.com</email>
</recipients>
</message>

But in mysql, you’re completely listless. To accommodate a variable length collection of items you need to set up a separate table that connects back to the owner of the data. In this case you might have a seperate ‘recipients’ table with rows that contained each address, together with some identifier that linked it with the original message held in another table. It’s issues like this that make a native XML database like MarkLogic very appealing if you’re mostly dealing with XML documents.

What I’d like to do is define my email message data model once, and then derive both the XML parsing and mysql interaction code from that. That would let me rapidly change the details of what’s stored without having to trawl through pages of boiler-plate code. I’m getting close, sticking to a simple subset of XML that’s very close to JSON, but defining a general way to translate lists of items back and forth is really tough.

I’m trying to avoid being an architecture astronaut, but it’s one of those problems that feels worth spending a little bit of upfront time on. It passes the "will this save me more time than it takes in the next four weeks?" code ROI test. I’d welcome any suggestions too, this feels like something that must have been solved many times before.

Do your taxes with implicit data

Turboscreenshot

Quicken’s TurboTax is the slickest and deepest online app I’ve used. I’ve been a fan since 2003, and they just kept getting better. One thing that stood out this year was the unobtrusive but clear integration of their help forums into every page you’re working with. There’s a sidebar that has the most popular questions for the current section, ordered by view popularity. It’s applying Web 2.0-ish techniques, using page views to rank user-generated content, but for once it’s solving a painful problem. Maybe I’m just old, but I feel sad when I see all the great work teams are doing to solve mild consumer itches like photo organization that are already over-served, and my doctor’s practice still runs on DOS.

It was fascinating to read John Doerr’s thoughts on how Intuit was built, from his introduction to Inside Intuit. I’ve never managed to computerize my household finances (Liz has an amazing Excel setup that has to be seen to be believed) but their focus on customers has shone through all my encounters with them. It’s great to see they keep looking for ways to use the new techniques to improve their services, Microsoft could learn a lot from them. I know they sent someone to Defrag last year, so maybe I’ll see some more implicit web techniques when I do my 08 taxes?

Turboanswer

Using Outlook to import emails to Exchange is painfully slow

Tortoise

Outlookscreenshot

Once I’d converted the Enron emails to a PST and loaded them into Outlook, I thought I was almost done with my quest to get them on my Exchange server. The last remaining step was to copy them to an Outlook folder that’s in an account that’s hosted on that server. With the PST conversion taking about a day, I assumed it would take a while, but after running for 6 days, it’s still only up to the B’s in alphabetical order!

ExMerge is an alternative way to import a PST onto an Exchange server. It only supports non-unicode files though, and has a 2GB limit, so that doesn’t work for the 5GB Enron data set. Another suggestion (Experts’ Exchange, so scroll down past the ads to see the comments) is to turn off cached mode and do File->Import from within Outlook. I’ve cancelling my current copy and so far this approach seems a lot faster.