Forgot your password?
typodupeerror
Microsoft Perl

Microsoft Bots Effectively DDoSing Perl CPAN Testers 332

Posted by timothy
from the stuck-in-a-rut dept.
at_slashdot writes "The Perl CPAN Testers have been suffering issues accessing their sites, databases and mirrors. According to a posting on the CPAN Testers' blog, the CPAN Testers' server has been being aggressively scanned by '20-30 bots every few seconds' in what they call 'a dedicated denial of service attack'; these bots 'completely ignore the rules specified in robots.txt.'" From the Heise story linked above: "The bots were identified by their IP addresses, including 65.55.207.x, 65.55.107.x and 65.55.106.x, as coming from Microsoft."
This discussion has been archived. No new comments can be posted.

Microsoft Bots Effectively DDoSing Perl CPAN Testers

Comments Filter:
  • I've seen it before (Score:5, Interesting)

    by LordAzuzu (1701760) on Monday January 18, 2010 @07:54AM (#30806860)
    I manage some networks in my home city in Italy, and in the past year I've often seen strange traffic coming from some of their IP addresses. Guess they have been exploited by someone long time ago, and didn't even notice it.
  • by Anonymous Coward on Monday January 18, 2010 @07:54AM (#30806870)

    Bing? ...But that would only help them to DDoS Bing.

  • by AHuxley (892839) on Monday January 18, 2010 @08:03AM (#30806952) Homepage Journal
    Its not a bug, its a feature to index a site with a new, rapid, powerful, direct, personalised crawler :)
    http://arstechnica.com/microsoft/news/2010/01/microsoft-outlines-plan-to-improve-bings-slow-indexing.ars [arstechnica.com]
  • by Yvanhoe (564877) on Monday January 18, 2010 @08:16AM (#30807034) Journal
    There is such thing as criminal incomptence. If a script kiddie can be arrested for having a virus "out of control" I don't see why Microsoft engineers DDOSing a website couldn't be charged.

    By the way a philosopher once told that "evil" did not exist. That it was most of the time just a kind of hidden stupidity.
  • by hairyfeet (841228) <.bassbeast1968. .at. .gmail.com.> on Monday January 18, 2010 @09:07AM (#30807428) Journal

    But MSFT is a corporation, which thanks to our corporate butt kissing congress and courts can just go "ooopsie", maybe cut a small check at most, and walk away scott free.

    And as for your philosopher? I saw an interview with Joss Whedon on writing evil characters that I thought really hit the nail on the head. He said, and I paraphrase "The villain never sees himself or herself as evil. To them there is a perfectly justifiable reason for their actions. I have known some truly evil people, those that have intentionally hurt their fellow man out of pure malice, and to them their actions were justified and noble. They simply didn't see what they did as wrong."

    Which is how you get MSFT and Intel paying backroom deals to crush competition, or Jack Trammell and his "business is war" philosophy. To the ones making the decisions "the other guy would do it to us if they could, so why shouldn't we do it to them?". I'm sure that if you talked to Gates or the head of Intel you could never get them to believe that crushing your competition any way you can is wrong. To them that was/is business 101 and not evil. That is why I think Whedon was right, the villain always thinks they are noble.

  • Re:MS ineptitude? (Score:2, Interesting)

    by Anonymous Coward on Monday January 18, 2010 @09:29AM (#30807604)
    It kind of depends on the individual robots.txt. Google, for instance, added a bunch of extended rules that they respect but which aren't officially part of the robots.txt spec (which is pretty limited). If they've added some of those rules in it could be that it's failing to validate when the MS bot hits it and therefore being ignored.
  • by beadfulthings (975812) on Monday January 18, 2010 @09:51AM (#30807866) Journal

    It's interesting to read this, as I've had some random and somewhat incomprehensible port scans coming from an IP address identified as one of theirs. If you're just an insignificant slob, you can't write to their abuse address, either; you'll get bounced. I simply blocked that particular IP address. Let them worry about who's gotten to them.

  • by jchawk (127686) on Monday January 18, 2010 @09:57AM (#30807954) Homepage Journal

    The CPAN folks could complain to their ISP and have them drop the traffic that's coming in to their boxes.

    Most ISP's will work with you to correct DDOS problems.

  • by mR.bRiGhTsId3 (1196765) on Monday January 18, 2010 @10:21AM (#30808206)
    That would be tremendously amusing. I can see the headline now. Bing robots DDoS attack every Unix hosted site by assuming Windows linefeeds.
  • by jc42 (318812) on Monday January 18, 2010 @10:35AM (#30808346) Homepage Journal

    As said below, never ascribe to malice that which can be adequately explained by stupidity. (Insert lame joke about MSFT being full of stupidity here).

    Yeah, though this particular sort of stupidity has been going on for a long time, and not just at Microsoft (though they seem to be the worst culprit).

    I run a couple of sites that, among other things, has links to return the "content" in a list of different formats (GIF, PNG, PS, PDF, ...). Periodically, the servers get bogged down by search sites hitting them many times per second, trying to get every file in every format. The worst cases seem to come from microsoft.com and msn.com, though it happens with other search sites, too. Actually, the first attempts I saw at "deep search" like this came from googlebots around 10 years ago, though they quickly backed off and haven't been a serious problem since then. MS-origin "attacks" of this sort have been happening every few months, for nearly a decade.

    I've generally handled them with a couple of techniques. One is to check the logs for successive requests from the same address, and insert sleep() calls with progressively longer sleeps as more messages arrive. The code prefixes the "content" with a comment explaining what's happening, in case a human investigates.

    Another technique is to look for series of "give me this in all your output formats" requests, verify that it's a search bot, and add the address to a "banned" list of sites that simply get a message explaining why they aren't getting what they asked for, plus an email address if they want to get in contact. So far nobody at any search site has ever used that address. I did once get a response from a guy who was studying sites with such multi-format data, for a school project, to see how the various output formats compared in size and information content. I took his address off the banned list, and suggested that he add a couple-second delay between requests, and he finished his project a few days later.

    I suspect that the googlebot folks may have read my explanation of the delays and added code to spread their requests out over time, since that's what their bots seem to do now. But I never heard from them. They must have gotten complaints (and bans) from lots of web sites when they started doing this, so they probably realized quickly that they should add code to prevent such flooding of sites.

  • Re:MS ineptitude? (Score:4, Interesting)

    by ShecoDu (447850) on Monday January 18, 2010 @12:14PM (#30809590) Homepage

    I remember reading that the MSNBOT reads the "Robots.txt" file, but cpantesters has a lowercase filename:

    http://static.cpantesters.org/robots.txt [cpantesters.org]

    http://static.cpantesters.org/Robots.txt [cpantesters.org] doesn't exist, so basically MSNBOT only respects the robots.txt on case insensitive operating systems.

  • by dissy (172727) on Monday January 18, 2010 @12:28PM (#30809784)

    Every once in a while, I still see sites that don't serve up unless you include "www." in the address - but it's like I said - a dufus.

    Looks like someone hasn't read RFC 1178 and enjoys breaking interoperability.

    Your method also breaks email by redelegating MX records one sub domain above where the control should be and MX's point to, thus breaks delegation of sub domains.

  • Re:MS ineptitude? (Score:4, Interesting)

    by John Hasler (414242) on Monday January 18, 2010 @01:23PM (#30810500) Homepage

    The standard clearly specifies lower case. However, if you are correct there's a simple way to send bingbots one way and all other bots another: create Robots.txt and robots.txt with different contents.

  • by Anonymous Coward on Monday January 18, 2010 @03:49PM (#30812382)

    Bing should have used Wget first to download the articles to a local hard drive, and also to add a 2 to 3 second wait. Let it run over the weekend. Then test the search indexing algorithms on the local HTML files. They were probably performing indexing tests. I know they have smart people working for them, so it probably involved a contractor who didn't think about performance issues.

  • by drinkypoo (153816) <martin.espinoza@gmail.com> on Tuesday January 19, 2010 @06:33AM (#30817856) Homepage Journal

    Instead we have Slashtroglodytes screaming about conspiracies by MSFT.

    Just for the record, since you're commenting under a thread I started, I do not believe that there was a conspiracy to attack CPAN. I think there is a conspiracy to continue accidentally attacking CPAN. The information provided ought to be more than sufficient to figure out what is going on. Remember, any time two people work to screw a third out of something, it's a conspiracy by definition.

The typical page layout program is nothing more than an electronic light table for cutting and pasting documents.

Working...