Five short links

grainyfive

Photo by N-ino

Word2Vec – Given a large amount of training text, this project figures out words show up together in sentences most often, and then constructs a small vector representation that captures those relationships for each word. It turns out that simple arithmetic works on these vectors in an intuitive way, so that vector(‘Paris’) – vector(‘France’) + vector(‘Italy’) results in a vector that is very close to vector(‘Rome’). It’s not just elegantly neat, this looks like it will be very useful for clustering, and any other application where you need a sensible representation of a word as a number.

Charles Bukowski, William Burroughs, and the Computer – How two different writers handled onrushing technology, including Bukowski’s poem on the “16 bit Intel 8088 Chip”. This led me down a Wikipedia rabbit hole, since I’d always assumed the 8088 was fully 8-bit, but the truth proved a lot more interesting.

Sane data updates are harder than you think – Tales from the trenches in data management. Adrian Holovaty’s series on crawling and updating data is the first time I’ve seen a lot of techniques that are common amongst practitioners actually laid out clearly.

Randomness != Uniqueness – Creating good identifiers for data is hard, especially once you’re in a distributed environment. There are also tradeoffs between creating IDs that are decomposable, since that makes internal debugging and management much easier, but also reveals information to external folks, often more than you might expect.

Lessons from a year’s worth of hiring data – A useful antidote to the folklore prevalent in recruiting, this small study throws up a lot of intriguing possibilities. The correlation between spelling and grammar mistakes and candidates who didn’t make it through the interview process was especially interesting, considering how retailers like Zappo’s use Mechanical Turk to fix stylistic errors in user reviews to boost sales.

2 responses

  1. Pingback: Data Viz News [21] | Visual Loop

  2. Pingback: Data Viz News [21]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: