Why the hell are you sucking my bandwidth for your mashup?

That’s a good question.

The informal rules behind what’s acceptable use of someone else’s web server are clear if you write a new browser. Nobody complained when firefox came along, because there’s real people reading the content that the server owners are paying to send.

The rules are also well understood if you write a new robot to crawl the web, they should tread very lightly indeed, respect the robots.txt file, and keep some delays in between fetches, so as to avoid slowing down the server for the real traffic.

SearchMash is somewhere in between these two extremes. Originally, it was a pure browser. It is still entirely user-directed, so there’s a good chance that the bandwidth is going towards your target audience. On the other hand, an entire page of search results will be fetched at once, so it’s not as user directed as if they’d directly clicked on your link.

I do avoid fetching anything but the main HTML until the user requests a preview of the page, to keep the bandwidth demands as small as possible, so no images are requested.

I know not everyone will agree that it’s a net benefit, so I’ve made sure that the User-Agent header is always set to MashProxy for all requests, so servers can easily block my traffic. I considered a whitelist system too, since that would also prevent intranet access, but could see no practical way of that gaining adoption.

	bouquetsweetly69036a… on Meet Fiona and Abby
	softlysuitcb91a8b8b1 on Meet Fiona and Abby
	Zero-Copy GPU Infere… on Why GEMM is at the heart of de…
	Moonshine Voice完全解説｜… on Announcing Moonshine Voice
	Moonshine KI-Sprache… on Introducing Moonshine, the new…

Pete Warden's blog

Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better.

Why the hell are you sucking my bandwidth for your mashup?

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply