How Gmail collapses quoted text

Contents

A friend recently asked if there was a good way to detect just the added text in an email reply. This would allow users to reply directly to emails showing things like Facebook messages, and have the reply show up in a decent form on that other service. Spotting just the new content is fairly tricky, because you’ve not only got the quoted text of the original message, different email programs also add their own decorations to give attribution to the quotations, eg:

------ Original Message -----


On Tue, Mar 4, 2008 at 8:15 PM, Pete Warden <pete@petewarden.com> wrote:

From: Pete Warden 
Sent: Wednesday, March 04, 2008 8:17 PM
To: Pete Warden
Subject: Testing 2

The solution he is looking at for removing this boilerplate is collecting a library of examples, and figuring out some regular expressions that will match them. They’re fairly distinctive, so it should be possible to do a pretty accurate job spotting them. The main problem is that there’s so many different mail programs out there, and they all seem to add slightly different decorations.

Detecting the quoted text is more of an algorithmic problem, and comes down to doing a fuzzy string search to work out if some text roughly matches the contents of the original mail. Another approach would be to look for >’s at the start of a line, and would work reasonably well if it wasn’t for Outlook. For once, there’s actually a helpful patent that describes how Google does this in Gmail. I really hate software patents, but at least this one contains some non-obvious parts, is not insanely broad and explains reasonably well the implementation behind it. They don’t talk about handling the boilerplate decoration very much, apart from mentioning they look for common headers like "From:". For the quotations, it looks like they do some magic with hash calculations to spot small sections of matching text between the two documents, and then try to merge them into larger blocks.

	bouquetsweetly69036a… on Meet Fiona and Abby
	softlysuitcb91a8b8b1 on Meet Fiona and Abby
	Zero-Copy GPU Infere… on Why GEMM is at the heart of de…
	Moonshine Voice完全解説｜… on Announcing Moonshine Voice
	Moonshine KI-Sprache… on Introducing Moonshine, the new…

Pete Warden's blog

Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better.

How Gmail collapses quoted text

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply