Forgot your password?
typodupeerror
Microsoft Perl

Microsoft Bots Effectively DDoSing Perl CPAN Testers 332

Posted by timothy
from the stuck-in-a-rut dept.
at_slashdot writes "The Perl CPAN Testers have been suffering issues accessing their sites, databases and mirrors. According to a posting on the CPAN Testers' blog, the CPAN Testers' server has been being aggressively scanned by '20-30 bots every few seconds' in what they call 'a dedicated denial of service attack'; these bots 'completely ignore the rules specified in robots.txt.'" From the Heise story linked above: "The bots were identified by their IP addresses, including 65.55.207.x, 65.55.107.x and 65.55.106.x, as coming from Microsoft."
This discussion has been archived. No new comments can be posted.

Microsoft Bots Effectively DDoSing Perl CPAN Testers

Comments Filter:
  • by Lennie (16154) on Monday January 18, 2010 @08:56AM (#30806884) Homepage
    http://blogs.msdn.com/

    I've seen it fail many times
  • MS ineptitude? (Score:2, Insightful)

    by Anonymous Coward on Monday January 18, 2010 @08:59AM (#30806906)

    From TFA:

    Hi,
    I am a Program Manager on the Bing team at Microsoft, thanks for bringing this issue to our attention. I have sent an email to nospam@example.com as we need additional information to be able to track down the problem. If you have not received the email please contact us through the Bing webmaster center at nospam@example.com.

    I mean, what additional information is needed wrt "respecting robots.txt" and "not letting loose more than one bot on a site at a time"?

    Bing. Meh.

  • by SharpFang (651121) on Monday January 18, 2010 @08:59AM (#30806908) Homepage Journal

    No, we just make mistakes writing our Perl programs for automatic downloading stuff from MSDN. Like, download() unless success, and forget to set success=true;

  • by tjstork (137384) <todd.bandrowskyNO@SPAMgmail.com> on Monday January 18, 2010 @08:59AM (#30806910) Homepage Journal

    I know everyone likes to assume that Microsoft is being evil here, but wouldn't the more realistic assumption be that they were just being incompetent?

  • by Lloyd_Bryant (73136) on Monday January 18, 2010 @09:06AM (#30806976)

    I know everyone likes to assume that Microsoft is being evil here, but wouldn't the more realistic assumption be that they were just being incompetent?

    Sufficiently advanced incompetence is indistinguishable from malice. For additional examples, see Government, US.

    The simple fact is that ignoring robots.txt is effectively evil, regardless of the intent. It's not like robots.txt is some new innovation...

  • by fish waffle (179067) on Monday January 18, 2010 @09:06AM (#30806982)

    I know everyone likes to assume that Microsoft is being evil here, but wouldn't the more realistic assumption be that they were just being incompetent?

    Probably. But since incompetence is the plausible deniability of evil it's sometimes hard to tell.

  • by alexhs (877055) on Monday January 18, 2010 @09:10AM (#30807006) Homepage Journal

    these bots 'completely ignore the rules specified in robots.txt.'

    Microsoft ignoring standards is not incompetence, it's policy (NIH syndrome).

  • by djupedal (584558) on Monday January 18, 2010 @09:12AM (#30807012)
    > "I know everyone likes to assume that Microsoft is being evil here, but wouldn't the more realistic assumption be that they were just being incompetent?"

    We assume MS is evil...

    We know they are incompetent.

    We feel this is typical.

    We pray they'd just go away.

    We think this will never end...
  • by gmuslera (3436) on Monday January 18, 2010 @09:15AM (#30807026) Homepage Journal
    They are not ignoring robots.txt, probably just that they understand that file in their slighly different, but in the end incompatible, format. As every other file.
  • by MrMr (219533) on Monday January 18, 2010 @09:19AM (#30807058)
    The problem is, there is no evidence that:
    Never ascribe to stupidity that which can be adequately explained by malice.
    Is invoking more entities.
    In fact, claiming that the commercially most successfull software company got there through stupidity rather than malice sounds extremely implausible to me.
  • Are you sure? (Score:5, Insightful)

    by Errol backfiring (1280012) on Monday January 18, 2010 @09:21AM (#30807070) Journal
    Are we sure this traffic comes from Microsoft? Could it not consist of forged network packets? You don't need a reply if you are running a DDOS. On the other hand, why would anyone, including Microsoft, want to bring down CPAN?
  • by Sarten-X (1102295) on Monday January 18, 2010 @09:38AM (#30807184) Homepage

    For ignoring robots.txt, they don't deserve any more nor less.

  • by PetoskeyGuy (648788) on Monday January 18, 2010 @10:08AM (#30807434)

    Why make things worse? Block the ip address or range and notify the admins. This isn't a chan mob.

  • by schon (31600) on Monday January 18, 2010 @10:09AM (#30807448)

    It has nothing to do with the RTFA.

    their own guidelines on their site

    As anyone who has ever read MS documentation can tell you, you need to read it, then implement a test, so you can see what it really expects, then adjust your test, then try it until it works.

    Their problem is that they expected MS documentation to actually describe the expected behaviour.

  • by Zarf (5735) on Monday January 18, 2010 @10:29AM (#30807608) Journal

    Clue: Subtle joke, deserves 'funny' moderation ;)

    Subtle + Slashdot = FAIL

  • by blueZ3 (744446) on Monday January 18, 2010 @10:36AM (#30807710) Homepage

    What's amusing about the issue in the kb is that the problem that they're "solving" by breaking the username/password in a URL standard is NOT a problem with username/password URLs, but a problem with how IE displays the URLs. In other words, rather than fixing the behavior of IE's address and status bars to display such URLs correctly, they just stopped supporting them.

    Incompetence at that level isn't just indistinguishable from malice, it IS malicious.

  • by Anonymous Coward on Monday January 18, 2010 @10:55AM (#30807924)

    "as we need additional information to be able to track down the problem."

    IP addresses aren't enough? You're MS--if you can't fix the problem and IP addresses are given, damn, that's just sad. You're freaking massive multi-billion dollar tech companies, and this is the best you can do?

    No wonder Chinese hackers own our asses.

    Then again, it took Comcast 9 months to fix a security hole in customer accounts (which would have required an s to http to make pages SSL'd), and the only reason it was "fixed" was because they did their annual website makeover and changed their entire system to something Flash based. Then again, I had contacted a VP, VP's security, referred to web security, and talked to web security 3x, talked to a manager. The last 3 groups verified the problem. It was referred to their web applications team by that point, who sat on it.

    Lovely world we live in.

  • by Penguinisto (415985) on Monday January 18, 2010 @11:12AM (#30808114) Journal

    As said below, never ascribe to malice that which can be adequately explained by stupidity. (Insert lame joke about MSFT being full of stupidity here).

    Given the back-story on the whole Danger data loss affair [arstechnica.com], stupidity is the FIRST thing I'd ascribe to Microsoft these days...

  • by WinterSolstice (223271) on Monday January 18, 2010 @11:20AM (#30808190)

    Actually, your statement works better with 'INSERT LANG HERE'...

    I'm always surprised by how people seem to think that any language has a monopoly of some sort on sloppy and/or lazy coders. Been doing IT a long time, and the one thing that never changes is the sloppy/lazy code issue. It even predates programming, you know - look at infrastructure around the world for examples of "just toss something out there, hope it works".

  • by Short Circuit (52384) <mikemol@gmail.com> on Monday January 18, 2010 @11:23AM (#30808224) Homepage Journal

    A quick guess? Identifying unique sites by domain name, rather than by IP address, and either the bot or server not respecting HTTP 301 redirects.

    With Rosetta Code, I once had www.rosettacode.org serving up the same content as rosettacode.org. My server got pounded by two bots from Yahoo. I could set Crawl-Delay, but it was only partially effective; One bot had been assigned to www.rosttacode.org, while another to rosettacode.org, and they were each keeping track of their request delay independently. I've since corrected things such that www.rosettacode.org returns an HTTP 301 redirect to rosettacode.org, and have was eventually able to remove the Crawl-Delay entirely.

    I've since worked towards only serving up content for any particular part of the site on a single domain name, and have subdomains such as "wiki.rosettacode.org" redirect to "rosettacode.org/wiki", and "blog.rosettacode.org" to "rosettacode.org/blog". Works rather nice, though it does leave me a bit more open to cookie theft attacks.

    YMMV; As I said, that was a quick guess.

  • by Hurricane78 (562437) <deleted.slashdot@org> on Monday January 18, 2010 @11:25AM (#30808246)

    As said below, never ascribe to malice that which can be adequately explained by stupidity.

    Must be really easy to just beat you in the face, and say “Ooops, I’m sorry, I’m so st00pid! *drool*”
    I call bullshit on that rule.

    My rule: Don’t make judgements at all (either way), about things that you just don’t know.

  • by Alpha830RulZ (939527) on Monday January 18, 2010 @11:40AM (#30808420)

    You know, it's easy to poke fun at the Microsofty, but is it possible that he was just trying to find out what was being hit so that he could figure out who in his organization he should contact? Maybe there is some uber technical way he could have figured this out, or maybe he should have RTFB, but his response sounded well intentioned and responsive. What would you prefer? The microsoft of old?

  • by Anonymous Coward on Monday January 18, 2010 @11:51AM (#30808536)

    Seems like they read everything but robots.txt.

  • by jc42 (318812) on Monday January 18, 2010 @11:55AM (#30808564) Homepage Journal

    They admitted they were powerless to solve their own problems without help from their victims.

    Heh. It's another "damned if you do; damned if you don't" scenario. Usually, people criticise Microsoft for developing software without bothering to consult or test with actual customers. Now we have a manager of a MS dev group that actually does communicate (though not exactly with "customers"), and acts on what they say, so he's criticised for needing help from his "victims".

    Ya can't win that game.

    But the fact is that if you're developing server-side web software, you need to test it against real-world sites, not just the toy sites you've set up in your lab. And we all know the "Sourcerer's Apprentice" sort of bug that produces a runaway test that tries to do something as many times as it can per second until it's killed. Good testers will be on the lookout for such events, but it's understandable that they might fail occasionally

    Among web developers, MS does have a bit of a reputation for hitting your new site with a flood of requests, trying to extract everything that you have (even the content of your "tmp" directory which your robots.txt file says to ignore). There are lots of small sites that block MS address ranges for just this reason.

    It should be considered good news that there's at least one MS manager who understands all this, and is willing to talk to the "victims" and fix the problems. Now if they could fix the next-level problem, that this sort of thing happens repeatedly and their corporate culture seems to have no way to prevent it from happening again.

  • Mod parent up (Score:4, Insightful)

    by Lonewolf666 (259450) on Monday January 18, 2010 @11:55AM (#30808568)

    While he could be more polite, it is indeed embarrassing for Microsoft if they cannot check their own network
    a) for the existence of computers with given IPs
    b) what these computers are doing

    I think that deserves an "insightful" that cancels out the "flamebait".

  • by John Hasler (414242) on Monday January 18, 2010 @12:10PM (#30808706) Homepage

    Robots.txt is merely advisory. Ignoring it is discourteous and oafish but not illegal.

  • hello? firewall? (Score:3, Insightful)

    by v1 (525388) on Monday January 18, 2010 @12:15PM (#30808752) Homepage Journal

    if it's a scan (TCP established stream, taxing the SERVERS, not the NETWORK) that's the problem, as opposed to a SYN flood etc, and the IP addresses are in a very small range, why aren't they just using a hardware firewall at the router and blocking the IPs? There's not a whole lot to "distributed" when it's coming from a pair of C's.

    Not saying they should be DOING it, but this is not a Denial of Service, it's a Denial of Stupid.

  • by MstrFool (127346) on Monday January 18, 2010 @12:50PM (#30809260)

    Same reason other folks can't, they are human. Look, I despise MS for a variety of reasons and am one of the rabid anti-MS folks. But honestly, they do enough that is legit to gripe about, no need to blow a mistake like this out of proportion. Considering all they do it was inevitable to happen at some point. Shit happens, any one that codes has had a mega-woops at one point or an other, and if they haven't they they are cookie cutter coding and not risking creativity. Hate them for needlessly locking the geeks from the systems, for locking the owners out of the systems while permitting hackers more remote access rights then they could get at the system it self. But this? 'eh, they goofed, get over it and worry about the real evil they are doing.

  • Re:No problem (Score:1, Insightful)

    by Anonymous Coward on Monday January 18, 2010 @12:51PM (#30809278)

    He's just running Debian stable. SCNR

  • by Short Circuit (52384) <mikemol@gmail.com> on Monday January 18, 2010 @12:58PM (#30809372) Homepage Journal

    The REAL solution to your problem is for everyone to abandon the dumb-as-shite "www" prefix.

    Why bother with www.example.com and example.com? Get rid of it. Anyone who still puts "www." on their business cards is a dufus.

    REAL solutions to immediate problems don't depend on the rest of the world changing to suit my needs. Also, the fact remains that there are links out there that point to "http://www.rosettacode.org/w/index.php?something_or_other", not all of those links will (or can) change, and I would be an absolute fool to knowingly break them, if I want people to visit RCo via referral traffic.

  • by raju1kabir (251972) on Monday January 18, 2010 @01:00PM (#30809394) Homepage

    Different system's doesn't really apply but what if the site's robots.txt is slightly different (different newlines or something) which is causing an unforeseen error?

    There is a spec for robots.txt. If someone's not following it, then it's their fault. Given Microsoft's past history, I know where I'd point the finger absent any more concrete information.

  • by mounthood (993037) on Monday January 18, 2010 @01:19PM (#30809652)

    As said below, never ascribe to malice that which can be adequately explained by stupidity.

    Must be really easy to just beat you in the face, and say “Ooops, I’m sorry, I’m so st00pid! *drool*” I call bullshit on that rule.

    My rule: Don’t make judgements at all (either way), about things that you just don’t know.

    How about: Don't mistake organizational stupidity for individual stupidity. This isn't the case of a single bad coder making a mistake, this is an organization that's chosen to how much effort to apply. How much testing and review? What failsafe's, logging and active monitoring? Will options for feedback be accessible and responsive? Stupidity and Malice aren't mutually exclusive for an individual, and certainly not for an organization.

  • by Chris Burke (6130) on Monday January 18, 2010 @01:28PM (#30809778) Homepage

    I've never liked that saying because of the implication that malice and stupidity are exclusive.

    Dumb and mean are often found together.

  • by Anonymous Coward on Monday January 18, 2010 @05:43PM (#30813010)

    Never ascribe to malice and stupidity what can be explained by stupidity alone.

     

    That better?

  • by Anonymous Coward on Monday January 18, 2010 @05:57PM (#30813176)

    Hey, great, sexism.

  • by Anonymous Coward on Monday January 18, 2010 @07:51PM (#30814402)

    Even professionals.

    You're implying "professionals" work there? Ha, ha ha. Ignoring robots.txt, particularly with the extraordinary resources they have to get it right, is incompetence, not professionalism.

"Consistency requires you to be as ignorant today as you were a year ago." -- Bernard Berenson

Working...