Listing of web crawlers that do not support compression

If you are the author of any of these spiders, then please add support for content compression when you crawl the web. This will save you bandwidth on your crawling system, and it saves bandwidth on the servers that you crawl.

Adding compression support can be very simple -- if your spider is coded in Perl using LWP::UserAgent, then the addition of a single line of code will enable compression support.

$ua->default_header('Accept-Encoding' => 'gzip');
and then you need to make sure that you always refer to 'decoded_content' when dealing with the response object.

For other languages, all you need to do is to add

Accept-encoding: gzip
to the HTTP request that you send, and then be prepared to deal with a 'content-encoding: gzip' in the response.

Happily, some of the large spiders do support compression -- the googlebot and Yahoo Slurp do (to name but two). Since I started prodding crawler implementors, a couple have implemented compression (one within hours), and another reported that it was a bug that it didn't work -- which would be fixed shortly.

Crawlers which do more than 5% of the total (uncompressed) crawling activity are marked in bold below.

CrawlerLast IP used
7Siters/1.08 (+https://7ooo.ru/siters/)" "pond.gladstonefamily.net109.94.211.139
masscan/1.0 (https://github.com/robertdavidgraham/masscan)" "-178.54.86.119
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)" "73.253.74.102180.215.168.130
Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" "gladstone.name216.244.66.228
Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" "pond.gladstonefamily.net216.244.66.194
Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" "pond1.gladstonefamily.net216.244.66.194
Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" "www.gladstone.name216.244.66.228
Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36" "73.253.74.102180.215.168.130
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6" "pond1.gladstonefamily.net95.217.160.112
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36" "73.253.74.102113.28.8.131
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36" "73.253.74.102:80138.99.216.147
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36" "73.253.74.102:8080138.99.216.147
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PretzelBot/0.3; +http://www.example.com/bot.html) Chrome/41.0.2272.96 Safari/537.36" "www.gladstonefamily.net50.19.145.237
Wget/1.19.4 (linux-gnu)" "gladstonefamily.net5.2.194.166

Comments, problems etc to
Philip Gladstone

Last modified Sunday, 19 November 2006