Microsoft Bots Effectively DDoSing Perl CPAN Testers 332
at_slashdot writes "The Perl CPAN Testers have been suffering issues accessing their sites, databases and mirrors. According to a posting on the CPAN Testers' blog, the CPAN Testers' server has been being aggressively scanned by '20-30 bots every few seconds' in what they call 'a dedicated denial of service attack'; these bots 'completely ignore the rules specified in robots.txt.'"
From the Heise story linked above: "The bots were identified by their IP addresses, including 65.55.207.x, 65.55.107.x and 65.55.106.x, as coming from Microsoft."
This is a normal occurence for Bing (Score:5, Informative)
I had a registration page - static content basically. The only thing that was dynamic was that it was referred to by many pages on the site with a variable in the querystring. Bing decided that it needed check on this one page *thousands* of time per day.
They ignored robots.txt.
I sent a note to an address on the Bing site that requested feedback from people having issues with the Bing bots - nothing.
The only thing they finally 'listened' to was placing "" in the header.
This kind of sucked because it took the registration page out of the search engines' index, however it was much better than being DDOS'd. Plus, the page is easy to find on the site so not *that* big a deal.
Bing has been open for months now and if you search around there are tons of stories just like this. Maybe now that a site with some visibility has been 'attacked', the engineers will take a look at wtf is wrong.
Flooding... (Score:5, Informative)
I have noticed the microsoft crawlers (msnbot) being fairly inefficient on many of my sites...
In contrast to googlebot and spiders from other search engines msnbot is far more aggressive, ignores robots.txt and will frequently re-request the same files repeatedly, even if those files haven't changed... Looking at my monthly stats (awstats) which groups traffic from bots, msnbot will frequently have consumed 10 times more bandwidth than googlebot, but is responsible for far less incoming traffic based on referrer headers (typically 1-2% of the traffic generated by google on my sites).
Other small search engines don't bring much traffic either, but their bots don't hammer my site as hard as msnbot does.
Re:Probably just a bug. (Score:5, Informative)
You're probably new here, but if you'd RTFA, you'd see that:
Come to think of it though, isn't this what happens to most people who try to interoperate with Microsoft?
Amusingly, if I Google for "bing robots.txt" [google.co.uk] I get a link to a bing page titled "Bing - Robots.txt Disallow vs No Follow - Neither Working!" which has already been elided from history by Microsoft [bing.com]. CLassy.
Re:So how do we DDoS Microsoft? (Score:5, Informative)
Not necessary. A Bing Product Manager has already commented on the CPAN Testers blog entry [perl.org] upon which the article is based:
Hi,
I am a Program Manager on the Bing team at Microsoft, thanks for bringing this issue to our attention. I have sent an email to barbie@cpan.org as we need additional information to be able to track down the problem. If you have not received the email please contact us through the Bing webmaster center at bwmc@microsoft.com.
As said below, never ascribe to malice that which can be adequately explained by stupidity. (Insert lame joke about MSFT being full of stupidity here).
Re:Probably just a bug. (Score:1, Informative)
Excuse my ignorance, but isn't robots.txt compliance easily enforceable on the server? I remember something about hiding links to trap pages in order to indentify robots and then holding identified robots responsible for robots.txt infractions by blocking their IP address.
Re:Typical M$ (Score:1, Informative)
That's not a troll. That's common knowledge.
A more appropriate mod would be +5 Redundant.
Re:So block those IP ranges? (Score:4, Informative)
> ...why not just block them?
They have.
Re:Are you sure? (Score:2, Informative)
You only see an IP in an apache log after a successfull TCP handshake. This is hard (not impossible, but really, really hard) to do with a forged IP.
Re:Are you sure? (Score:5, Informative)
Are we sure this traffic comes from Microsoft? Could it not consist of forged network packets?
It's a TCP connection, so they need to have completed the three-way handshake for it to work. That means that they must have received the SYN-ACK packet or by SYN flooding. If they are SYN flooding, then that would show up in the firewall logs. If they've received the SYN-ACK packet then they are either from that IP, or they are on a router between you and that IP and can intercept and block the packets from thatIP.
You don't need a reply if you are running a DDOS.
You do if it's via TCP. If they're just ping flooding, then that's one thing, but they're issuing HTTP requests. This involves establishing a TCP connection (send SYN, receive SYN-ACK with random number, reply ACK with that number) and involves sending TCP window replies for each group of TCP packets that you receive.
On the other hand, why would anyone, including Microsoft, want to bring down CPAN?
Who says that they want to? It's more likely that their web crawler has been written to the same standard as the rest of their code.
Re:Oh! *Literally* Microsoft bots! (Score:5, Informative)
Probably not, if you look at other incidents: http://cmeerw.org/blog/594.html [cmeerw.org] it appears they just like to push the limits.
Re:Robots.txt (Score:3, Informative)
It's the first. Whatever you specify in the robots.txt as no-follow etc... means not to spider the pages, so no scanning of them at all.
You use it for when you only want part of your site to appear in search results, such as just the front page (for example). The rest of the site should not be touched by the bot at all.
Re:Robots.txt (Score:3, Informative)
DDoS? Really? (Score:3, Informative)
If it was really a DDoS, you wouldn't be able to filter the IP out with a simple regex (like the
To boot, TFA didn't even say DDoS. Maybe that's too much to expect the editors to oh... I don't know...say... RTFA or Fact-Check it?
I should drop my bar a bit, I suppose.
Re:Probably just a bug. (Score:4, Informative)
I'm sure you heard that, but it's not actually true in any way.
No problem (Score:5, Informative)
ipchains -A input -j REJECT -p all -s 65.55.207.0/24 -i eth0 -l
ipchains -A input -j REJECT -p all -s 65.55.107.0/24 -i eth0 -l
ipchains -A input -j REJECT -p all -s 65.55.106.0/24 -i eth0 -l
problem solved
Re:Probably just a bug. (Score:2, Informative)
Re:Robots.txt (Score:3, Informative)
It is a request not to scan part or all of a site. robots.txt [wikipedia.org]
Every site does not have dozens of powerful servers and terabytes of bandwidth, nor is every site an ad-supported one that wants to maximize traffic. Common courtesy requires that a bot operator minimize his impact on any given site and honor requests not to index. Of course "courtesy" and "honor" are concepts that baffle Microsoft managers.
Re:No problem (Score:5, Informative)
Linux IP Firewalling Chains, normally called ipchains, is free software to control the packet filter/firewall capabilities in the 2.2 series of Linux kernels. It superseded ipfwadm, but was replaced by iptables in the 2.4 series.
You're a few kernels behind.
Re:US Government is good. (Score:3, Informative)
Nothing you listed under the "War on Drugs" has anything to do with the war on drugs.
The war on drugs has made America a police state where the government can seize any of your property and auction it for profit before your trial. Even if you are found innocent, or the charges are thrown out for insufficient grounds, you will not be compensated for your lost money or profit. It has made an America where more people are imprisoned than any other nation on earth. It has made a nation where the cheapest and most effective drug for curing glaucoma and mitigating the pain and nausea associated with cancer treatments is a crime. Its made a nation where at least half its citizens are criminals.
Simple solution: (Score:1, Informative)
Add to your .htaccess file:
deny from 65.55.207.
deny from 65.55.106.
deny from 65.55.107.
Re:Check the blog... (Score:3, Informative)
They admitted they were powerless to solve their own problems without help from their victims.
Heh. It's another "damned if you do; damned if you don't" scenario.
Un, no. Not unless you're a rabid MS apologist.
Usually, people criticise Microsoft for developing software without bothering to consult or test with actual customers.
True.
Now we have a manager of a MS dev group that actually does communicate (though not exactly with "customers"), and acts on what they say, so he's criticised for needing help from his "victims".
Umm, exactly how did he act on what they said? According to the quote, they explicitly didn't act, which is the problem people are complaining about.