Netgear Routers DoS UWisc Time Server 447
numatrix writes "For the last few months, hundreds of thousands of netgear routers being sold had hardcoded values in their firmware for ntp synchronization, causing a major denial of service to the University of Wisconsin's network before it was filtered and eventually tracked down. Highlights how not to code embedded devices." A really excellent write-up of the incident.
Bad form in general (Score:5, Insightful)
Or any other kind of software for that matter.
Re:So who got fired? (Score:2, Insightful)
Err why ? (Score:3, Insightful)
especially a home router....sounds like another port open for someone to hack at for no real gain....
Re:Err why ? (Score:5, Insightful)
It's not about just embedded devices... (Score:5, Insightful)
Highlights how not to code embedded devices
I think this highlights a "how not to code" idea, period. In 1986, when I was taking a BASIC (boo, hiss) course in high school, I learned that values should be expressed as variables even if the coder does not expect them to change. So instead of using (32 feet/second^2), one should instead declare g once, using whatever units are appropriate, and thereafter refer to g instead of a hardcoded value. If g changes, the coder need only update one line.
Note: I am not a programmer/coder/developer in any sense of any of the words, so technical nits should remain unpicked; however, if I am completely out in left field, please feel free to point that out.
Netgear should bear the cost... (Score:5, Insightful)
i know USA isnt .AU but.. (Score:2, Insightful)
i think net gear should be thankfull that
it wasnt sued for the bandwidth costs and
the reduced levels of service for the uni..
Re:So who got fired? (Score:1, Insightful)
Im really not being a smartarse, Id really like to know.
Since a tester can only test off a spec and there was no spec (because if there was, somebody would have read it and this wouldn't have happened), then I can't see how using black-box testing techniques you can find this sort of problem.
Sure, you can do performance testing, but you wouldn't test multiuple instances of the hardware, you would test the throughput of a single instance of the hardware.
So I ask again.. where do you think this would have been picked up?
Comment removed (Score:5, Insightful)
Re:It's not about just embedded devices... (Score:5, Insightful)
So you're not in left field, it's just that the developer who wrote the software apparently did exactly what you said, which was not relevent to the mistake at hand, which was more about the faulty implementation of the NTP service, and the fact that it was hardcoded to a single IP address.
SEGA's online game servers (Score:5, Insightful)
It's not a new story, but I think it bears repeating as a showcase of stupidity.
Re:So who got fired? (Score:5, Insightful)
Bah.
This is one 'simple mistake' by one company that namaged to send a constant "250,000 packets-per-second (and over 150 megabits-per-second)".
Now I know Netgear is a pretty big outfit, but there are LOTS of companies like that out there, and these little mistakes can add up. How much network traffic could be avoided with proper programming?
Also, this kind of makes me think about the useless network activity my XP box (bleh) tries to send out. Multiply that by millions and millions, and you get a number a whole lot bigger than the one above.
Who pays for all that wasted bandwidth?
Re:It's not about just embedded devices... (Score:2, Insightful)
Re:Our usage graph...You Jerks! (Score:4, Insightful)
don't get me wrong, I love the irony, but your network admins are having enough troubles on a Friday already.
Re:So who got fired? (Score:3, Insightful)
Usually, someone should say "hey, are we following the RFC for the protocol here?"
Usually, someone should say "isn't hardcoding one single IP address for a service a bad design idea?"
None of these things apparently happened. It may not show up in "testing" (hey, everything worked fine) but in quality assurance, they should be checking their code for anomalies.
Re:Think Strata (Score:3, Insightful)
If you're running a large network where clock synchronization is important, you are MUCH better off running your own time server than having you clients talk to someone else's, regardless of stratum. Otherwise the amount of jitter with all your NTP clients going longer distances to fetch the time will actually result in less consistent times overall.
Re:I wonder what NetGear's liability is. (Score:5, Insightful)
It would probably be deductable, passing some of the cost on to we taxpayers; but would sit alot better with public perceptions of the company.
Set up a few CS scholarships or funding a chair at the University would help.
They could turn it into a publicity coup and end up paying out less in the long run (and screw the lawyers too). Some (not all) insurance companies have finally discovered that it's usually cheaper to negotiate with the plaintiff right away, avoiding all of the sabre rattling and lopping off a third (or more) of the total probable cost.
Litigation is rarely the best answer.
Spytime (Score:4, Insightful)
Re:So who got fired? (Score:3, Insightful)
Its up to the developer to follow the required standards and up to the architect to make sure bad design decisions are not made.
The grandparent was implying that it was the fault of a tester that the bug went undetected. My point is that in the absence of a spec, mistakes such as this can only be discovered and repaired by the developers.
(Im also not trying to shift blame, Im just saying it's almost impossible for a tester who is doing his job properly to find this)
Re:So who got fired? (Score:5, Insightful)
This was a big screwup - when an NTP query fails, you don't start retrying every second until it comes back. You don't hardcode a single server address for it. And you don't put this in 700,000 pieces of released hardware.
Re:Err why ? (Score:2, Insightful)
In addition to needing accurate timestamps for logging, routers are very convenient NTP servers.
Rather than having your NTP packets pass through the router, have them stop AT the router, and have the router poll for accurate time. This is FAR less overhead for a large subnet (think hundreds of hosts).
Of course, the router SHOULD be responsibly configured to poll a willing timesource.
Re:So who got fired? (Score:4, Insightful)
QA isn't just for spell checking.
Re:So who got fired? (Score:5, Insightful)
Netgear reported that the non-UW addresses were used for debugging by the developers.
Here's the interesting part: at least two of those are 12.* addresses --- cablemodems with attbi.com. So if you want to know who the developer responsible is, it might be a reasonable guess it's whoever lives at those IP addresses!
Could this happen with GPS? (Score:2, Insightful)
I'm ignorant about GPS's.
When someone comes out with a GPS wristwatch, or every laptop/palm etc has one, could this happen?
Re:So who got fired? (Score:3, Insightful)
A code review would hopefully catch the "hey, we're only using *one single time server for all our hardware* and the *hey, there's no way of configuring this short of patching the firmware* parts. Maybe the address part was overlookable, but the other bits?
>> Usually, someone should say "hey, are we following the RFC for the protocol here?"
> According to the article the packets were well-formed.
Well-formed, yes. But sending retries every second on failure? I coulda sworn the RFC recommended a poll interval of at least 6sec...(but I could be wrong. might'n't've been the RFC - but somebody somewhere reccommends a much higher number for a retry interval, it even says so in the article). It may follow the letter of the law but not the spirit, if I may borrow a cliche.
> Isn't hardcoding a default address good design rather than leaving an uninitialized variable?
Lesser of two evils? Or possibly greater - if they'd left it unitialized, the damn thing wouldn'ta worked and it wouldn't make it to market before it got checked.
The worst part is the fact that they coded it *hard* - not just default-valued it, they coded it so you couldn't change it, and that's ludicrous for a system that's depending on resources it doesn't have control over.
Really, I think "quality assurance" in business-speak means different things to different orgs. I contracted once at a company that had a multipart QA system - some folks went over design specs, some went over code, some did blackbox testing of product. Granted it didn't work so well because they had idiots running the whole thing, but the point is, this was poor design that made it to market when it shouldn't have. Maybe it wasn't a "QA-department" issue, but it was some quality that wasn't assured.
Re:So who got fired? (Score:5, Insightful)
Re:I wonder what NetGear's liability is. (Score:3, Insightful)
I disagree. Netgear is obviously liable, but just because they could be sued doesn't mean they should be. There's a fine line between excercising your rights over others and being an ass, one that I think is crossed way too often. In this case, as you say, the actual damages (bandwidth) are vague. More importantly, Netgear and UWisc got together and are fixing the problem. Considering that this is (now) a very public story, Netgear won't want to further damage it's reputation, and I'm sure they'll donate and hardware and bandwidth necessary to fix the problem. If they had just ignored it, a suit would be justified, but at this point, litigation won't solve anything. It'll just make Netgear look bad, which will make them angry, and start a conflict that only lawyers will benefit from.
Thank you, UWisc and Netgear (Score:5, Insightful)
To Netgear, THANK YOU for not calling upon the DMCA, filing NDA law suits, etc.
It was resolved in a diplomatic and professional manner...and the write up explaining the entire incident was educational and informative.
Now, if it had been SCO or Microsoft involved......
Re:So who got fired? (Score:3, Insightful)
They didn't hardcode just one address. They hardcoded a bunch of them but, by the time UWisc figured out what was happening, they were the only one of the public servers left standing (at least, at the original IP address). BTW: {,X}NTPD doesn't support DNS names for all parts of it's config file, either.
In other words, NetGear managed to DOS a number of public NTP servers out of existence.
The problem here really isn't one of hardcoding a single IP address. It's a problem of taking a shortcut to RFCs and other protocol documentation and not seriously considering the long term consequences. And it's not likely to be caught in a normal code review because the problem looks like the result of a reasonably high-level design trade off. (hard-coded ping times, no DNS and fixed source port all smell of trying to delete "unnecessary" code from the PROM).
This is rather like a littering problem: "It's just one candy wrapper" seems harmless, until you multiply it by 300,000 people using the same road daily and the 2 year+ lifetime of some plasticised wrappers. Similarly, "It's just one packet a second" sounds harmless until you consider the effects of a 1 Million unit product run.
(BTW: I'm guessing that UWM's most recent NTP spike was when the power came back on in NewYork and Ontario last week).
That's pretty nasty (Score:4, Insightful)
When I configure my computers to use someone else's NTP server, I always send them an email to let them know (or whatever else they request that people do).
What's worse is that Netgear hardcoded the address, in a way that can't easily be changed without a firmware upgrade (something that very few of the intended Netgear firewall customers will do: these customers are looking for a plug-it-in-and-forget-it box, and are either unwilling or unable to learn how to set up a firewall box themselves). And then, on top of that, Netgear botches the implementation of the protocol, causing it to rapid-fire out requests in certain circumstances!
NTP is a very, very low-profile protocol. It uses UDP, so that connection state doesn't have to be maintained. It sends out packets very rarely, at most every few minutes while being set up, and then once time has been established and clocks are in sync, roughly one packet every few days. Netgear's botched programming caused a NTP flood of one packet per second! This is a ridiculous rate several orders of magnitude above what is normally seen in a functioning NTP implementation.
And Netgear sold hundreds of thousands of these things....
I'm amazed that U-Wisc put up with this effective DoS attack on their servers for so long. They showed great patience waiting several months for their request to crawl through Netgear's channels. Companies really need to have a quick method of access into their corporate structure for people who report major flaws like this! Because Netgear's traditional channels of customer feedback (tech support, etc.) weren't set up for this, U-Wisc's requests kept getting lost in Netgear's bureaucracy. Is Netgear so arrogant to believe that all of their products are and will always be 100% flawless?
There really needs to be a special method of access when people report security holes and such. Microsoft, surprisingly, is starting to come around with this, maintaining a special point of contact for people who have discovered security-related issues or major flaws like this. I hope that more companies do this in the future.
If Netgear would do these three things, I would be happy:
1) Set up their own NTP master servers (stratum 1, using a GPS receiver or atomic clock), at Netgear itself. They would use Netgear's own bandwidth, not U-Wisc or anyone else's. Netgear's future products would then default to using these servers, and they would put out a patch so that hopefully some fraction of older products would also use these servers. That way, if there is a flaw in the future, Netgear will eat their own dogfood! I am pleased to see that Netgear is already taking steps in this direction.
2) Change their corporate structure to be more receptive to outsiders who report serious design flaws or major issues caused by their products (such as this NTP flood), going beyond normal tech support, so that quick action can be taken to avert damage. Tech support is really only set up to handle questions about an individual device owned by the person calling in about it, and not set up to handle serious technical or security issues about all devices in an entire product line.
3) Reimburse U-Wisc for the cost of banwidth consumed by these buggy Netgear devices. If U-Wisc isn't blocking incoming NTP entirely by now, pay for robust NTP servers to handle the high volume of traffic. If Netgear had targeted pretty much any private company instead of U-Wisc, I'm sure they would have sued for damages by now!
And remember, ask first before using someone else's NTP server, especially if you plan to hardcode the address into your product
Re:Windows Time Service (Score:4, Insightful)