Monday, The Death of Websites 207
An anonymous reader writes "Developers implementing 'weekend inspiration' are more dangerous than hackers.
Vnunet.com has this article about how eager developers and administrators create more troubles than hackers and viruses do for websites. How about those of us who start the week with a cup of coffee and the morning online-news? My inspiration and new ideas for development are definitely not the cause of the Monday-crash hour ... I think."
day traders (Score:5, Interesting)
Manic monday eh? (Score:5, Interesting)
I mean its easy for it to happen. We had problems like this with our monitoring system (tho it was manic friday where someone would attemtp to impliment something before the weekend because of course, the weekend is when you want pages the least so you want to get anything that causes false pages fixed on friday to maximize enjoyment of the weekend)
Now we have development and test servers where things live BEFORE they go production. I never had any idea that it would help so much until we finnaly implimented it.
-Steve
What is it about developers? (Score:5, Interesting)
Just a thought: The rest of the world lumps all of us IT people together; the distinction between, say, a "developer" and "sysadmin" means nothing to my non-geek friends.
I don't think stuff like this happens often to sysadmins or DBAs. How often do you come into work on a monday and decide to migrate to xfs because you read on slashdot over the weekend that SGI ported it to linux, and SGI is cool? Likewise, how often does an Oracle DBA decide on Monday to move some production tablespaces over to rawfs from cooked, because she read a whitepaper from Oracle on Saturday that talked about performance increases from raw filesystems?
I've written a lot of code, and also sysadmin'd an awful lot of servers, and in my experience probably 90% of "production outages" are software changes--exactly like the article said--poor change control, etc etc. So, what's the point of dynamic multipathing, patching, dual power supplies, etc etc, when most problems occur because someone got excited and forgot a semicolon somewhere?
Is it fair to say that sysadmins fix things and developers break them? What is different about a software engineer's brain than a systems engineers? Talk amongst yourselves :)
Changes on Monday? (Score:4, Interesting)
On the other hand, I'm not sure incremental development is that much worse than large releases. You're either releasing a bug or two a week or waiting eight weeks to release all your bugs at one time.
Mondays & Fridays Should Be Banned! (Score:5, Interesting)
I and my fellow Contractors had an unwritten rule to "hold off" on all "good" ideas generated in meetings etc on Monday & Friday. Almost inevitably they would
all be canceled within a couple of days. Not subjecting ourselves to post/pre weekend madness saved ourselves a ton of work and helped us bring the project in on time!!
Re:Great sample:) (Score:3, Interesting)
A Google search for even the word "website" came back with: Results 1 - 10 of about 68,800,000.
Even with that number, which I would estimate to be low of the total number of websites in existance, that puts the 70 site survay at
Re:The cause of bugs (Score:5, Interesting)
For the concrete "holiday lockdown" example, I think he's only partially right. In my development group, we explicitly lock down ALL changes to our production web apps well before, and all through, the Christmas shopping season, to prevent the inadvertent introduction of any (new) bugs. It's not a side effect of vacation time -- it's an explicit operations decision to reduce the risk of breakage.
So, yeah, while we're not touching it the stability seems to increase, but no existing (but less critical) bugs get fixed either. No large-scale app is bug-free -- the lockdown period just seems to stabilize things but it's an illusion caused by the lack of new species of bugs popping up.
In the more abstract "development introduces bugs" sense, it's a fact of life in complex systems that new code means new bugs -- and if we never introduced new code (->features) then we'd lose customers. So I take his statement to imply that we should only be introducing 100% bug-free code -- which is a PHB pipe dream.
I would fire the IT staff (Score:2, Interesting)
Of course I'm the one that implemented a testing domain (live on the Net) for just such purposes. NEVER, NEVER, NEVER "test" anything on a production system. I can't even think of any installations that were not tested for MONTHS "offline" before being implemented. When the day comes to install there usually aren't any shockers either. It just works.
Of course I'm the one that's NEVER allowed a Windows server to even be a consideration. "Are you NUTS?"
You UK guys need to work on this (Score:2, Interesting)
I'm going to start with pointing out that the first sentence of the article said "UK websites", not "US". Obviously that means the people across the Pond need to work on this.
And what a surprise that when people roll out changes sometimes things break. Oh My God. Have you cured cancer yet?
And I'd say more often than not the "problems" on Monday are caused by bug fixes that developers are rushing on to production to fix bugs that were found over the weekend. And, as we all know, sometimes bug fixes skip QA...
Seriously, most places I work have Go Live set to be Monday, or, more often, Tuesday. You go live when you have already tested it; its gone through QA; and you're sure the staff is there. Tuesday is the better date in order to deal with key people taking long weekends, and it gives you two or three days to fix issues before the next weekend. Besides, Mondays are already hellish without adding "release new version" to the list of torments.
Nope - Tuesday or Thursday (Score:5, Interesting)
Re:Developers shouldn't be able to break stuff (Score:3, Interesting)
I enforce upon myself the requirement to run new code on a test server first, but a formal and managed development environment just isn't going to happen at small companies, or larger companies with small dev staffs.
Then there is also the issue of things that are extremely difficult to model in a test environment. Complex failures. Failures that may not show up in a unit type test, but only show up when components interact. It is possible to model some of these things, but sometimes the unit testing code would be larger than the code under test. This is also not practical for a smaller development group.
Re:Weekend Update (Score:5, Interesting)
Re:day traders (Score:2, Interesting)
Second, I don't know about bandwidth studies but you will find that most of the activity in the markets takes place on the open or on the close. Mid morning and mid afternoon markets are generally dead.
Also, to do a bandwidth study just take the intraday volume of a stock and correlate it to that of etrade or yahoo.
Change controls? (Score:2, Interesting)
We recently dropped the policy, but I'm not sure if there has been any fallout from doing so.