Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Programming Data Storage Databases IT

Developer Accidentally Deletes Production Database On Their First Day On The Job (qz.com) 418

An anonymous reader quotes Quartz: "How screwed am I?" asked a recent user on Reddit, before sharing a mortifying story. On the first day as a junior software developer at a first salaried job out of college, his or her copy-and-paste error inadvertently erased all data from the company's production database. Posting under the heartbreaking handle cscareerthrowaway567, the user wrote, "The CTO told me to leave and never come back. He also informed me that apparently legal would need to get involved due to severity of the data loss. I basically offered and pleaded to let me help in someway to redeem my self and i was told that I 'completely fucked everything up.'"
The company's backups weren't working, according to the post, so the company is in big trouble now. Though Qz adds that "the court of public opinion is on the new guy's side. In a poll on the tech site the Register, less than 1% of 5,400 respondents thought the new developer should be fired. Forty-five percent thought the CTO should go."
This discussion has been archived. No new comments can be posted.

Developer Accidentally Deletes Production Database On Their First Day On The Job

Comments Filter:
  • How the fuck (Score:5, Insightful)

    by Gojira Shipi-Taro ( 465802 ) on Saturday June 10, 2017 @11:20PM (#54594555) Homepage

    How the fuck does a new hire have that kind of access? that's not even enough time for on-boarding. The CTO should definitely get the shitcan, as should anyone in HR involved in that debacle.

    • Re:How the fuck (Score:5, Insightful)

      by Anonymous Coward on Saturday June 10, 2017 @11:22PM (#54594567)

      The entire CXX staff level should be let go. Why a person fresh off the street had permission to even make a mistake of such magnitude is beyond me.

      • Re:How the fuck (Score:5, Insightful)

        by lucm ( 889690 ) on Sunday June 11, 2017 @01:49AM (#54594933)

        The entire CXX staff level should be let go.

        First, "CXX" level is not "staff". Second, firing the entire senior management team because of this incident would be completely reckless. I understand the outrage but that's not how companies work in real life. The last thing a company in that situation needs is more instability.

        The proper way to address this is to stabilize the situation, then make sure the problem cannot occur again. And this typically doesn't simply mean firing people, because odds are that there are cultural or organizational factors that made this situation possible (crazy deadlines, shoestring budgets, etc.), and those would probably lead the replacement to making the same kind of mistakes down the road.

        What is needed is new processes and controls. You start with a simple governance framework (like COBIT maybe) where each part of the IT ecosystem is linked to a specific business leader, then you let each of those leaders make sure that their area of responsibility is well managed from a risk perspective. That's how you make the company more resilient, not by firing people who maybe were not empowered to make the right decisions in the first place.

        • by dbIII ( 701233 )

          What is needed is new processes and controls

          Yes, but I think what people are getting at here is that current policies appear to be so fucked up that the team who implemented them may be unlikely to implement new processes and controls that are any better than they have already done.
          I've seen that sort of stuff in a few places, (notably government owned corporations - worst of both worlds) where management have been chosen for reasons other than ability.

          As shown by this incident that sort of mismanagement

        • What is needed is new processes and controls.

          What is needed is a backup system that actually works, and is used.

          I never trusted our official backup system, having seen it not work on several occasions. So I installed one of my own for the group. Sure enough, around a year later, a group member called in a total panic - she'd written a script that was supposed to perform a find then print the find results. What it managed to do was delete the whole database.

          Calls me in a tearful freakout - the IT folks backup didn't work.

          Mine did. There were a

    • How the fuck does a new hire have that kind of access?

      Having worked at some small businesses before it seems to me to be pretty common. The article said the business had a couple hundred people at most and 40+ developers. Quite likely the people there were there for a long time and they hired a handful of people since they now had a need for some help and finally some money to pay them. This is how things were done for years and not remembering what can happen if a newbie fucks up once in a while they thought nothing about handing over the documentation the

    • Corporate cheapskatism fucking them in ass: newbies handle key data and no backup system in place.

      Sue/fire the CEO, not the grad.

    • The trainer, the CTO, the Managed Services Director, and whoever let that doc with the PROD database URL out into the training materials. Those are the people I would hold responsible.
    • Actually reminds me of two fellows I worked for briefly. Both of them were actually too small to have a separate CTO. In the first case I decided to leave as soon as I figured out the legal liability their main customer had incurred due to pirated software. There were actually two packages, and one of them was a database. The second case was a total shoestring operation and one of the first things I discovered was that their so-called daily backup processes were not actually backing up anything.

      Now that rem

    • by tlhIngan ( 30335 )

      Well, depending on the size of the company, it could be possible. Maybe not if you're a 1000 person enterprise, but if you're a 10-100 person SME, it's definitely possible, especially a startup.

      However, the lack of backups is more damning - it means the entire company is one mistake away from losing it all. I don't care if it's a new guy - it could be the rockstar developer making a typo and deleting the entire database. It could be anyone.

      Even though I work for a company of less than 100 people, no one oth

    • How the fuck does a new hire have that kind of access? that's not even enough time for on-boarding.

      This has little to do with noob status as an employee, or even technical experience. The real question is why the fuck a developer has access to the Production system. We call the Non-Production environment Development for a fucking reason.

      The CTO should definitely get the shitcan, as should anyone in HR involved in that debacle.

      The CTO should get the shitcan for not ensuring backups were working, as well as not implementing proper security policy that prevents developers from fucking around in the Production system without assistance and a documented approval process.

      Regarding HR, they're fuck

    • by Z00L00K ( 682162 )

      Everyone screws up now and then, most often small, sometimes big. There's sometimes a small screw-up where you accidentally reboot the wrong server. Once is just an oops. It's when people screw up frequently and tries to get away with it you should bury them somewhere safe or remove them. If they admit their mistake then it's better to work on salvaging the pieces and patch together the remains or recover from a backup.

      Companies that don't have backups that they test - they are toast as it doesn't even have

    • Re: (Score:3, Interesting)

      by bigtiny ( 236798 )

      I'm always amazed when something like this happens, a lot of people's first reaction is 'who gets fired'? It's really not a productive attitude to take in a case like this.
      I find it problematic that
      - companies take an agile 'quick turnaround' approach to development without seeming to understand the risks, until they get bitten. This is an example.
      - seems that whoever's managing the dev team should have had a break-in period/mentoring system in place to make sure new hires (especially newbies straight out o

  • by Chas ( 5144 ) on Saturday June 10, 2017 @11:21PM (#54594559) Homepage Journal

    Okay, the guy fucked up ROYALLY.

    It happens. And he SHOULD get in a bit of trouble for it. That's how you learn "don't do that". I don't think they deserve to lose their job though.

    The CTO and all the people in charge of the backups need to be on the street YESTERDAY though. That the dev COULD do something like this is a major fuckup on their part. They simply didn't have their production system locked down properly.

    The fact that their backup system was non-functional is double-plus unforgiveable. The dev is merely the highlight for their massive cluster-fuck of a setup.

    • Okay, the guy fucked up ROYALLY.

      I don't think he did. I actually RTFA this time, and the guy was following the onboarding directions he was given. Where it went south was that he copied-and-pasted the wrong database credentials. He was supposed to use the username and password that a command had spit out, but he instead used the ones from the onboarding docs.

      I'll pause for a moment to let that sink in.

      Some jackass had put actual prod root creds in the onboarding docs, then gave them to a new graduate fresh on his first day of his first job, then walked away while he onboarded himself without supervision.

      This poor kid did absolutely nothing wrong except misreading some instructions. The engineering team responsible for the chain of events that led to this colossal fuck are completely and wholly to blame.

      • by Chas ( 5144 )

        He's a new guy. And the onus is on him for attention to detail.

        However, he's a new guy. Fuckups are to be expected.

        I've had fuckups at new jobs myself. This is how we LEARN.

        And yes, in that environment he was set up to fail.

      • by AmiMoJo ( 196126 )

        How could someone in this position recover from it?

        Suing the company for giving you in a potentially career ending position through no fault of your own might generate a one-off lump sum, but will be expensive and risky and likely make other companies not want to employ you in future.

        Doing nothing and hoping they don't sue might be the best thing. Just keep quiet, never mention that you ever worked there in your job history and try to move on. If it brings the company down or they do sue and your name becom

    • by lucm ( 889690 ) on Saturday June 10, 2017 @11:53PM (#54594703)

      The fact that their backup system was non-functional is double-plus unforgiveable.

      In my experience, continuous SAN replication is often to blame for a poor backup strategy. It creates the illusion of security - yes, your DR site is synchronized with production within seconds or milliseconds, but guess what, mistakes are also replicated.

      Replication -> floods, fire and similar disasters
      Backups -> oops my bad

      Both are needed.

      • That's the modern version of "we have RAID backup!"

        • RAID (especially those with parity) can be terrifying. Just think of it: you have a group of disks probably acquired at the same time and probably coming from the same vendor (or even same production batch) serving the same workload in the same environment. That implies a fairly similar MTTF for all the disks.

          Then one of the disks fail; this causes the other disks in the array to first handle a higher load, then to be brutally impacted by the rebuild process. That's like playing Russian roulette with a gatt

          • RAID (especially those with parity) can be terrifying. Just think of it: you have a group of disks probably acquired at the same time and probably coming from the same vendor (or even same production batch) serving the same workload in the same environment. That implies a fairly similar MTTF for all the disks.

            Then one of the disks fail; this causes the other disks in the array to first handle a higher load, then to be brutally impacted by the rebuild process. That's like playing Russian roulette with a gattling gun.

            Yeah. I had a RAID-5 at home. When one disk starting failing I didn't notice because the system kept running, and being a home setup it didn't actually have any lights or warning unless I manually opened the RAID manager and checked. I noticed when a second disk failed this time completely for all sectors, during the recovery two more disks started failing.

          • by ColaMan ( 37550 )

            About 15 years ago I was looking after an old Compaq ProLiant server that had a 6 disk SCSI raid array in some configuration I cn't recall.

            Oh, a drive just failed with an "Exceeded power on hours" error? Well, that's ok, there's a hot spare in the array, no problem.

            Next day, two other disks went offline, because they were all powered up at the same time when they were new, weren't they? And they were perfectly usable, it's just that the array controller noticed that their smart attributes had exceeded a thr

      • by rossz ( 67331 )

        Yep. I have a hot standby for our production database server. Everything done on the primary is almost instantly duplicated to the standby. That would include fucking shit up, e.g. drop table foo; Which is why I make a weekly backup to disk and keep the entire WAL history until the next full backup. The backup and WALs are kept on a filer, which is also replicated to another filer.

    • The CTO and all the people in charge of the backups need to be on the street YESTERDAY though.

      I think the company is hopelessly lost. It's not only the CTO that screwed up but the CEO who hired him. At that point, you run out of people to fire, and the company just goes out of business.

      And he SHOULD get in a bit of trouble for it.

      He should thank his lucky stars that he found out so quickly what a poorly run company had hired him.

  • Pretty much any comment that might be posted in this thread was already posted in the original Reddit thread over a week ago. Nothing insightful or interesting will come from this being posted here now.
  • by nomad63 ( 686331 ) on Saturday June 10, 2017 @11:34PM (#54594637)
    If a company looks at a non-working backups as a minor inconvenience, I think the CTO's ass should be on the firing line before this poor guy's. Yes, wat he did is inexcusable and in some cases, firing might be justifiable (as in, a junior developer on his first day in his/her job doing what on a production database case) but if someone assigned him/her to perform anything on the company-critical data, that person should be the one getting fired, not this guy's. I am a 20+ year experienced sysadmin and not too long time ago, when I started at a new position, I was not able to touch any system other than few development machines for 2+ months of my start date and I know/knew my shit. This company management shows their incompetency in more ways than one. Yet they are making this person the scapegoat. Good riddance to them, as their days are numbered.
    • Yeah, they were doomed to fail without backups.

      What if their server failed irreparably? What if some code went rogue and overwrote it? What if the server burns down because someone somewhere in the building left the stove on.

      The CTO should be fired for total incompetency. You can have read-write access to database servers without having access to schema changes on the database. Personally, even though one of my creds actually gives me this access to some of our production databases, I never do it myself and

    • Yes, wat he did is inexcusable

      Following the onboarding process on a sheet of paper that was given with him that some numbnuts decided should have the address and administrator credentials to the production database on it, on his first day, unsupervised?

      What this guy did is 100% excusable. The CTO and whoever created the onboarding process should see themselves out.

      I mean fuck I was on a visitors badge fully escorted for a whole week at my first job. That's right, they wouldn't even let me simply walk around the building unsupervised to

  • I've been in production management for 15 years... The first 5 years as a Sr. datacenter engineer. The next 7 years as a staff engineer and the past few years as a cloud architect. One thing that has NEVER changed is that developers are NOT allowed to touch production... and I mean not allowed to even log into any host at all...They don't even have a damn clue where the machines physically sit.

    Allowing a junior developer fresh out of college to log into production with privilege that makes even a minor c
    • So much this. It's a major PITA requiring permission from a director for me to get access to a machine that can access a production database. I am perfectly fine with this arrangement!

    • by rossz ( 67331 )

      One thing that has NEVER changed is that developers are NOT allowed to touch production... and I mean not allowed to even log into any host at all...They don't even have a damn clue where the machines physically sit.

      In my office, only one developer has limited production access, the senior guy. He's the only developer who can do code releases and he has RW access to the DB, but he can't mess with the system's configuration. If he's out of the office, I have to do the code releases as the senior system administrator.

  • by mykepredko ( 40154 ) on Saturday June 10, 2017 @11:37PM (#54594649) Homepage

    I'm surprised that the firm has not been named - while I would think that any company that had this happen to them would want to keep this confidential, I would think that somebody would talk about it separately. I suspect that the "company" is some podunk startup in which the CTO is the CEO, CFO, head of development and probably the HR head and they've just hired a developer without thinking about access restrictions (or verifying that backups are actually happening).

    Some more information would help clarify these questions and maybe better explain how such a situation could happen.

  • Seriously, a day 1 dev has direct production access? Hell, any dev has direct production access? No QA, no release management, no integration or functional test suite if they're doing some sort of continuous deployment?

    It's a pain in the ass, but if they've got any sort of actual real database, they'll have had a real database admin, running it with archive logs they can use to restore their data? ... plus their backups are gone?

    What sort of fly-by-night operation is this?

    • by dbIII ( 701233 )

      Hell, any dev has direct production access

      I agree with that - different mindset. Devs want to get stuff done ASAP and usually don't seem to get the concept of multiuser systems. Even very experienced devs do shit like reboot servers in working hours leaving dozens of staff twiddling their thumbs and unable to work unless months of effort has been put into changing their attitude to production.

  • by clovis ( 4684 ) on Saturday June 10, 2017 @11:52PM (#54594701)

    I say if he succeeded in putting that company out of business, then he should get a medal for sacrificing himself to destroy the company.

    My belief is when he saw on his first day, the badly written docs they handed him, with a printed (!) account/password having RW access, he instinctively threw himself on that grenade by destroying their production database. Only the most cowardly IT worker would have done otherwise.

    Thank you, selfless IT worker from saving us from the horror of whatever product they were trying to produce.

  • be happy (Score:5, Insightful)

    by ooloorie ( 4394035 ) on Sunday June 11, 2017 @12:03AM (#54594729)

    You don't want to work at a company where the backups don't work and where a new hire can accidentally delete all their data. Don't beg to stay, instead be happy that you found out quickly how incompetent that company actually is.

    • Wish I had a mod point for you, my friend. That is exactly, 100% correct. Rookies make mistakes...sometimes even stupid mistakes. It happens.

      If a rookie can wreck a company this badly, it's hardcore proof that the problem is a long, long way up the food chain.

      THAT is where heads should roll.

  • by GerryGilmore ( 663905 ) on Sunday June 11, 2017 @12:32AM (#54594777)
    ...we were a small company making the transition from a proprietary (TI/990 - DX/10 OS) system to a Prime UNIX system and I was the guy A) learning UNIX and B) writing shell scripts to replicate basic DX/10 stuff. One of my first scripts was to delete a user. Beyond the basic password deleting stuff, I thought it would be cool to delete the user's home directory also. You know, delete user "gerry" and also delete user's home dir "/usr/gerry". Guess what my brilliant script writing failed to anticipate? Yep, someone entering an empty user name. So, the script - running under SUID perms proceeded to delete "/usr/*", including "/usr/data". Oops! Fortunately, my backup script was better written and the customer only lost a few hors' worth of work, but....Whew! That's also how we learn!
    • by rvw ( 755107 )

      > You know, delete user "gerry" and also delete user's home dir "/usr/gerry".

      Who puts a user home folder in /usr/? Isn't that bad practise in the first place? The usr directory is for system stuff.

  • I mean, come on:

    - No working backup
    - Excessive access for the new person
    - CTO is incompetent and cannot admit to mistake

    This is an accident waiting to happen. And the new person has zero responsibility for it. Might be better off to be out of that fucked up company though.

  • Hey I had a security guy come into our data room to do a drive inventory and while waiting for me his curiosity got the best of him and he popped open a drive on 70 TB raid.

  • Lot's of blame to spread around.

    The code used in production should have been reviewed by someone before execution in production. No exceptions. Especially because it's a new guy on his first day. The code should have been run in a staging environment first. How long was it known that the backup system was broken? This mistake was obviously not the newbies fault.

    If my production DB backup was hosed, I would be dropping just about everything else to get it healthy again. A deleted database would mean so

  • If a junior developer can FUBAR the company, these are the two that fucked up royally. Very obviously processes are crap and gross negligence is running rampart.

    Fire these two bozos. Out of a cannon.

  • At least know that your backups work.

    In the mid-90s I had just started working at some place and the guy I was replacing (he left for a better opportunity) was showing me how everything was set up and as a demonstration he deleted his own account. I guess he felt he didn't need it anymore, but then he says there might be some useful stuff in there and tells me it would be a good exercise for me to learn how to restore from backup.

    Not a big deal, it was actually documented and I had done that before at a p

  • It is scary how few companies backup their crap. I would be willing to bet that less than 30% of billion dollar companies (that really depend on their computers for day to day operations) could be back in operation in under 24 hours if they lost all their servers at once.

    I wouldn't be surprised to find that a good percentage would be very screwed in the long term.

    This is not only their data but the system as a whole. When I am consulting at most companies that are retiring servers I often suggest that t
  • by riverat1 ( 1048260 ) on Sunday June 11, 2017 @03:30AM (#54595135)

    I can relate to this. I wiped out the production database for our ERP system when I was trying to create a copy of it. Fortunately I had good backups and was able to restore the DB with minimal loses but it took all day (this was back in 1997 and the computer wasn't particularly fast, a SparcServer 1000 with 2 CPUs and 500 MB of RAM). In my case I wasn't fired and retired from the job last year after 31 years.

  • This kid will have no problem getting a new job. What happened was not his fault in any way, shape, or form. The fault lies squarely with the CTO (in no particular order):

    1) He allowed a new hire to have superuser, unsupervised access to the production database without an ounce of training.
    2) He allowed an unvetted new-hire script to contain actual superuser credentials that could be used to wipe out the production database with a simple copy and paste error.
    3) He allowed a new hire to run that script uns

    • By the way, I did something similar on my second day on the job (no, I didn't get fired). I'll skip the details, but yes the management fucked up and had a hard time coming to grips with their fuckup. Fortunately, circumstances in my case were slightly different; but it was close enough to this story that I don't blame the new hire one bit.

  • by nospam007 ( 722110 ) * on Sunday June 11, 2017 @08:05AM (#54595701)

    The company is named 'British Airways'?

  • by DukeLinux ( 644551 ) on Sunday June 11, 2017 @10:53AM (#54596329)
    What even marginally competent IT manager would give an inexperienced person the ability to modify or delete production objects? I feel sorry for this guy, but his management is completely at fault here. We know the story. Nothing will happen to them and one or more will likely get promoted.

Technology is dominated by those who manage what they do not understand.

Working...