Should You Talk to Journalists?

June 6, 2011 By Pete Warden in Uncategorized Leave a comment

I've been helping to arrange some interviews for a reporter, and one of the friends I approached asked "Is there any benefit to the interviewee?". This is actually a very perceptive question, most people jump at any chance to talk to a journalist, but there's real costs to that decision. Speaking as someone who has both written and been written about for money, I know a journalist's job is to persuade you to talk to them, whether or not that's actually in your interest. After I thought about it, I told him it really came down to what your goals are.

Good things that may happen

– Your work might be covered and publicized.

– He may approach you for quotes about related stories in the future.

– He might introduce you to other people in the area.

Bad potential side effects

– You lose valuable time you could spend actually building things.

– He could garble or misquote your points, leading to negative publicity.

– Other publications may decide not to publish stories if you're seen as giving an exclusive to a rival.

What may happen if you don't talk

– A competitor does provide the needed quotes, and gets the publicity.

– The journalist covers you in a negative way. This is very rare, but it's always there as a threat.

Most people radically over-estimate the dangers of being mis-quoted, but also have unrealistic expectations of the power of good publicity. A lot of it boils down to networking and exposure, and how much that benefits you depends on what you're trying to do. If you're focused on research or making technical progress, it's probably a distraction you should ignore. If the startup/fundraising side is higher on your priority list, being able to point to articles can really help in establishing the ever-desired perception of traction.

It's worth thinking about how you'll deal with interview requests before they come up. I've always loved talking to people about what I do, and my frustration at not being able to discuss my work was a big part of why I left Apple, so I've ended up working on projects where my tendency pays off. Your situation may well be different though. Unless you're clear-headed about your goals, you'll end up wasting your time. It's also worth pondering which publications reach an audience you actually care about. It might be that comparatively-obscure industry journals will let you talk to the decision makers in your market a lot more effectively than a mainstream outlet, which should affect which journalists you spend time on.

Five short links

May 31, 2011 By Pete Warden in Uncategorized 1 Comment

Photo by Doug88888

Stanford’s Wrangler – A promising data-wrangling tool, with a lot of the interactive workflow that I think is crucial.

Open Knowledge Conference – They’ve gathered an astonishing selection of speakers. I’m really hoping I can make it out to Berlin to join them.

The Privacy Challenge in Online Prize Contests – It’s good to see my friend Arvind getting his voice heard in the debate around privacy.

The Profile Engine – A site that indexes Facebook profiles and pages, with their permission.

Acunu – I met up with this team in London, and they’re doing some amazing work at the kernel level to speed up distributed key/value stores, thanks to some innovative data structures.

Kindles Profiles are so close to being wonderful

May 28, 2011 By Pete Warden in Uncategorized Leave a comment

"Propose to an Englishman any … instrument, however admirable, and you will observe that the whole effort of the English mind is directed to find a difficulty, defect, or an impossibility in it. If you speak to him of a machine for peeling a potato, he will pronounce it impossible; if you peel a potato with it before his eyes, he will declare it useless, because it will not slice a pineapple"

I'd completely forgotten about this deliciously bitter quote from Charles Babbage in The Philosophical Breakfast Club, but thanks to Amazon's Kindle profiles site, I re-discovered it listed in my highlights. I was very excited when I stumbled across this social feature, I've been looking for an automatic way to share my reading list with friends. I've even experimented with scripts to scrape the reading history from my account, but never got anything complete enough to use. My dream is a simple blog widget showing what I'm reading, but without the maintenance involved in updating GoodReads with my status. I'm often reading on a plane or in bed at night, so the only way I'll have something up to date is if it uses information directly from my Kindle. I looked at the highlights page, and it looked like exactly what I was after, a chronological list of notes and the books I'd been reading recently:

Now all I needed to do was figure out how to make that page public. First, I had to go through all 160 books, and manually mark two check boxes next to each of them, making the book public, and then making my notes on it available. That was a bit of a grind (and something I guess I'll need to do for every book as I read it), but worth it if I could easily publish my highlights. After that though, I realized there was nothing like a 'blog' page for my notes that was available to anyone else. The closest is this one for my public notes:

https://kindle.amazon.com/profile/Peter-C–Warden/11996/public_notes

It just has covers for the five books I most recently altered the state of, whether or not they have any notes or highlights, and you have to click through to find any actual notes. The "Your Highlights" section that only I can access is perfect, exactly what I would like to share with people, its simplicity is beautiful. Short of posting my account name and password here, does anyone have any thoughts on how I could get it out there? Anybody at Amazon I can beg?

Facebook and Twitter logins aren’t enough

May 5, 2011 By Pete Warden in Uncategorized Leave a comment

Photo by Karen Horton

A couple of months ago I claimed "These days it doesn't make much sense to build a consumer site with its own private account system" and released a Ruby template that showed how to rely on just Facebook and Twitter for logins. It turns out I was wrong! I always knew there would be some markets that didn't have enough adoption of those two services, but thought that the tide of history would make them less and less relevant. What I hadn't counted on was kids.

My Wordlings custom word cloud service has seen a lot of interest from teachers who want to use it with their students, but especially amongst pre-teens, there's little chance they're on either Facebook or Twitter. They may not even have an email address to use! Since that's not likely to change, I added a new "Sign in for Kids" option that just requires a name, skipping a password even. It has the disadvantage that once you log out, you can't edit any of your creations, but that seems a small price to pay to make the service more accessible.

Using Hadoop with external API calls

May 2, 2011 By Pete Warden in Uncategorized Leave a comment

Photo by Joe Penniston

I've been helping a friend who has a startup which relies on processing large amounts of data. He's using Hadoop for the calculation portions of his pipeline, but has a home-brewed system of queues and servers for handling other parts like web crawling and calls to external API providers. I've been advising him to switch almost all of his pipeline to run as streaming jobs within Hadoop, but since there's not much out there on using it for those sort of problems, it's worth covering why I've found it makes sense and what you have to watch out for.

If you have a traditional "Run through a list of items and transform them" job, you can write that as a streaming job with a map step that calls the API or does another high-latency operation, and then use a pass-through reduce stage.

The key advantage is management. There's a rich ecosystem of tools like ZooKeeper, MRJob, ClusterChef and Cascading that let you define, run and debug complex Hadoop jobs. It's actually comparatively easy to build your own custom system to execute data-processing operations, but in reality you'll spend most of your engineering time maintaining and debugging your pipeline. Having tools available to make that side of it more efficient lets you build new features much faster, and spend much more time on the product and business logic instead of the plumbing. It will also help as you hire new engineers, as they may well be familiar with Hadoop already.

The stumbling block for many people when they think about running a web crawler or external API access as a MapReduce job is the picture of an army of servers hitting the external world far too frequently. In practice, you can mostly avoid this by using a single-machine cluster tuned to run a single job at a time, which serializes the access to the resource you're concerned about. If you need finer control, a pattern I've often seen is a gatekeeper server that all access to a particular API, etc has to go through. The MapReduce scripts then call that server instead of going directly to the third-party's end-point, so that the gatekeeper can throttle the frequency to stay within limits, back off when there's 50x errors, and so on.

So, if you are building a new data pipeline or trying to refactor an existing one, take a good look at Hadoop. It almost certainly won't be as snug a fit as your custom code, it's like using lego bricks instead of hand-carving, but I bet it will be faster and easier to build your product with. I'll be interested to hear from anyone who has other opinions or suggestions too of course!

The Best Park in San Francisco

April 30, 2011 By Pete Warden in Uncategorized Leave a comment

All photos © Heather Champ, with permission

My biggest worry when I moved to San Francisco in December was that my dog Thor would find urban life tough, without the wide-open spaces we'd got used to in Colorado. I found an apartment with a wide sunny window-sill for him to lie on and next to the great off-leash Duboce dog park, but what I didn't realize until I started exploring was that there was another gem just half a mile away. The first signs I saw of Buena Vista Park were the trees at its peak looming over the neighborhood, their tops wrapped in fog. Following Duboce Avenue uphill to its end, and then meandering through 37 acres of woodland, I found myself 575 feet high, looking out over the city as the clouds cleared.

It's now become our morning walk, and we can make it to the peak and back in 45 minutes if I'm in a hurry. That's not often though, because catching up with the other regulars has become part of the pleasure. In the peculiar way of the dog-walking world, I often feel like I know the canines before I've properly met the owners, especially since I can't compete with Thor's natural charms. A great case in point is Bug and Chieka's owner, or as she's better known in the tech world, Heather Champ, the pioneering former Flickr community manager who furnished these photos.

Though it's not on the same scale as Golden Gate or Presidio, it's actually the oldest official park in the city, and is full of corners and history to explore. It was only when I was chatting to one of the gardeners that I realized the marble chunks lining some of the paths were actually fragments of gravestones from the cemetery that was razed by WPA workers in the 30's. At night its nooks can make it less welcoming, with hoboes camping out and casual hookups, but in the morning it's a slice of heaven.

If you live anywhere near the Lower Haight, Upper Castro or Noe Valley areas, you really should check out this urban garden. There's nothing like wandering through groves of Eucalyptus and Redwoods, watching the fog blow through the canopy, to refresh your soul after working through painfully obtuse YouTube comments (just to pick a random example, ahem). And if you see a cute Chihuahua mix with a snaggle-tooth, be sure to make a fuss of him.

Location Tracking as Art

April 26, 2011 By Pete Warden in Uncategorized Leave a comment

I just discovered Maria Scileppi’s Living Brushstroke project that uses iPhone location tracking to create artistic views of people’s movements. The picture above is from the Chiditarod, “Probably the world’s largest pub crawl/food drive”. Almost everyone’s first reaction to seeing the trace of their movements retrieved from the iPhone logs is the same as ours, “Cool!”. If we can give users control over their own data (asking permission to record as Maria’s app does), there’s so many amazing projects like this we can build.

Anthem from Maria Scileppi on Vimeo.

Five Short iPhone Tracking Links

April 24, 2011 By Pete Warden in Uncategorized Leave a comment

Photo by Evil Science Chick

Some of you may have noticed a weekend hack I put together with Alasdair Allan for visualizing location data on your iPhone. Here's some random links related to the project:

Tell-All Telephone – View a German politician's life as a visualization, after he agreed to have detailed recording software added to his phone. [Update, I misread the story, and it was actually information gathered without his knowledge, but that he agreed to share afterwards to raise awareness. As commenter Seve says "The movement data of the german politician Matle Spitz were collected by law and not with his agreement. The movement and call data of everybody connected to a german mobil phone network were stored for 6 months. Finally the german supreme court stopped that law and the data were deleted. "

Geoloqi – A fascinating system that lets you volunteer to track your own movements, and share certain aspects of them with people you trust.

Location tracking on Android too – Detailed and fair technical analysis of the way that Android monitors your location, and how it differs from Apple's approach.

A Cryptographic Approach to Location Privacy – There are ways to get a lot of the benefits of location services without recording or revealing your position. Arvind's proposal shows how one application could work in a secure way.

22 Free Tools for Data Visualization and Analysis – The navigation is tough without a table of contents, but this is a guide overview of a lot of the tools you can use to turn your data into a visual story.

Data Science Toolkit 0.35 released

April 18, 2011 By Pete Warden in Uncategorized Leave a comment

Photo by Wonderlane

I've just completed deploying a new version of the toolkit. It contains quite a few bug fixes and improvements, along with two new features:

UK Support

You can now enter British postal addresses into the street2coordinates geocoder, and it will return the coordinates. With post codes included, it's normally accurate to within a couple of hundred feet. You can then run those positions through coordinates2politics to get the parliamentary constituency, county, council district and ward, NHS area and post code.

Time and date extraction

The new text2times method will scan through the text that you pass it, and pull out any strings that it can understand as times or dates. These include both a variety of formal date/time combinations like '10/28/01' as well as informal descriptions like 'next Friday'.

You can try it out by going to http://www.datasciencetoolkit.org/ , the command-line tools are at http://www.datasciencetoolkit.org/python_tools.zip , the new AMI is ami-f6e11d9f, and there's a new VM at http://static.openheatmap.com/dstk_v0.35.vmwarevm.tar.bz2

You should still be able to use your existing code with no changes, I've done my best to ensure everything's backwards-compatible, but let me know if anything breaks.

Five Short Links

April 12, 2011 By Pete Warden in Uncategorized Leave a comment

Photo by Nicolas Suzor

Newscoop – A content-management system designed by and for journalists. It’s been used in conjunction with Ushahidi to interesting effect.

Crisismappers extends UN capacity in Libya – As the power of crowd-mapping becomes more obvious, there will be pressure to use it in more ambiguous situations than natural disasters or the toppling of tyrants. Mapping military airbases in Libya gives a hint of what crowd-sourced warfare could look like.

StarCluster – A simple way to run and manage clusters of EC2 machines for scientific computing, complete with AMIs pre-loaded with useful software and sensible defaults. If this is your thing, you should check out Infochimp’s impressive ClusterChef too.

Fathom – A design firm with a portfolio of clear, crisp and beautiful infographics,

Trunkler – Link curation for your iPhone, built on top of the powerful Trunk.ly service. Shows why having a third-party API can pay off, even in the early days.

	Moonshine Voice v2 v… on Announcing Moonshine Voice
	Pete Warden on Launching a free, open-source,…
	riddelln on Launching a free, open-source,…
	I see dead people. Y… on Announcing Moonshine Voice
	Pete Warden: Announc… on Announcing Moonshine Voice

Pete Warden's blog

Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better.

Category Archives: Uncategorized