Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
Programming IT Technology

GitLab Says It Found Lost Data On a Staging Server (theregister.co.uk) 101

GitLab.com, the wannabe GitHub alternative that went down hard earlier this week and reported data loss, has said that some data is gone but that its services are now operational again. From a report The Register: The incident did not result in Git repos disappearing. Which may be why the company's PR reps characterised the lost data as "peripheral metadata that was written during a 6-hour window". But in a prose account of the incident, GitLab says "issues, merge requests, users, comments, snippets, etc" were lost. The Register imagines many developers may not be entirely happy with those data types being considered peripheral to their efforts. GitLab's PR flaks added that the incident impacted "less than 1% of our user base." But the firm's incident log says 707 users have lost data. The startup, which has raised over $25 million, added that it lost six hours of data and asserted that the lost doesn't include users' code.
This discussion has been archived. No new comments can be posted.

GitLab Says It Found Lost Data On a Staging Server

Comments Filter:
  • by Anonymous Coward

    Reputation is ruined forever. Everyone involved will never work in tech again, should kill themselves right now.

    • by AmiMoJo ( 196126 )

      On the other hand, having now made this mistake they are probably not going to make it again. Could be more reliably than companies which by chance have not needed to restore from backup yet.

      • Sounds like they restored from a backup. Backups are generally taken once per 24-hours, although PITR on databases is ... interesting, and complex as hell to pull off in the real world (I don't know why; it should be a simple operation, but no database seems to make it as easy as "look here for alternate binary logs and play forward until $TIME").

        Data loss of 6 hours of issues, MRs, comments, and the like is ... data loss of 6 hours. It's a lot in aggregate for something with over 70,000 users and 238,

        • Its pretty easy in Microsoft land, for instance: https://www.sqlservercentral.c... [sqlservercentral.com] and I have done similar things in bash to restore Oracle and Sybase DBs many years ago. Of course you have to have transaction logs to replay transaction logs and writing transactions logs is optional in MySQL and even if they are written, they are by default placed in the same data directory as the database.
          • Exactly, I do this all the time in MS SQL to troubleshoot application bugs by creating a DEV copy to a specific point in time.

            RESTORE DATABASE SomeDatabase_20170202
            FROM DISK = 'd:\Backups\SomeDatabase_20170202041500.BAK'
            WITH REPLACE, NORECOVERY,
            MOVE 'Data' TO 'D:\SQL\SomeDatabase_20170202.mdf',
            MOVE 'Log' TO 'L:\Log\SomeDatabase_20170202.ldf';
            GO

            RESTORE LOG SomeDatabase_20170202
            FROM DISK = 'd:\Backups\SomeDatabase_20170202050500.TRN'
            WITH NORECOVERY, STO

        • Restored from a manual backup and admin happened to take. All the automated systems failed... That said, nothing promotes fire safety like a good fire.
      • That matches my experience. My company offers an offsite, bootable backup solution so if anything bad happens to your server, you just boot the appropriate clone in our cloud and you're back in business. A LOT of our customers get our service when they find out the hard way why *proper* offsite backups are important. Many weren't too concerned about backup and business continuity until something bad happened to them.

        AFTER they have a major loss they get serious about making sure it won't happen again.

  • PFY ... (Score:4, Funny)

    by PPH ( 736903 ) on Thursday February 02, 2017 @12:09PM (#53788421)

    ... couldn't remember the exact database maintenance command sequence. So he called BOFH at home after hours for assistance.

  • by Anonymous Coward

    Of course it doesnt include users code - it's GIT for god sake. Developers have the whole repo on their own machine...

  • by JoeyRox ( 2711699 ) on Thursday February 02, 2017 @12:19PM (#53788491)
    To lose more of your data.
  • by TheDarkener ( 198348 ) on Thursday February 02, 2017 @12:25PM (#53788523) Homepage

    "GitLab.com, the wannabe GitHub alternative" ... Uhm, is that really accurate?

  • So they have found the data randomly on a server somewhere.

  • by Wuhao ( 471511 ) on Thursday February 02, 2017 @12:37PM (#53788605)

    Obviously, data loss is embarrassing. I think we all appreciate the importance of not only having multiple backups, but testing to ensure that your backups work, and are sufficient to fully restore operations. GitLab is just the latest in a long tradition of sites and services that have found themselves facing the consequences of not regularly testing their recovery plans.

    But I do respect their response. They quickly recognized what had happened, and they diagnosed what went wrong with their backups. They did not try to use PR-speak to conceal their mistake -- they publicly copped to it, in plain industry-standard language that their users would understand, and even offered a livestream of their team resolving the issue. I think this has been a masterclass in how to recover from a blunder. I bet you that this is not a mistake GitLab will be repeating anytime soon.

    Also, I think it's very fortunate that they're in the git repo business, and presumably users who had data that was affected by the loss still have a copy in their own local repos. Thank god for distributed SCM.

    • Thank god for distributed SCM.

      Considering that the particular SCM software in this story is Git, you should probably be thanking Linus Torvalds.

      On second thought, he might enjoy being called god. Carry on.

    • Obviously, data loss is embarrassing. I think we all appreciate the importance of not only having multiple backups, but testing to ensure that your backups work, and are sufficient to fully restore operations. GitLab is just the latest in a long tradition of sites and services that have found themselves facing the consequences of not regularly testing their recovery plans.

      But I do respect their response. They quickly recognized what had happened, and they diagnosed what went wrong with their backups. They did not try to use PR-speak to conceal their mistake -- they publicly copped to it, in plain industry-standard language that their users would understand, and even offered a livestream of their team resolving the issue. I think this has been a masterclass in how to recover from a blunder. I bet you that this is not a mistake GitLab will be repeating anytime soon.

      Also, I think it's very fortunate that they're in the git repo business, and presumably users who had data that was affected by the loss still have a copy in their own local repos. Thank god for distributed SCM.

      They claimed they did not lose any Git data, only database records pertaining to users, issue tracking, tasks, etc. I don't know of anyone who backs up their bug tracking and other databases, so some people probably would have preferred to have lost their git data. It's easier to restore on an active project.

  • Nice thing about having all these release stages is that they are tested before promoting to the next stage.
    Process for updating any stage:
    1. copy data from next stage.
    2. deploy new code
    3. TEST TEST TEST
  • Yes, there are questions about how this happened, how an admin was seemingly under a bit of pressure that that happened, the question about non-existent backups and whether they have people with enough Postgres skills, but I was impressed about the way they admitted it. They didn't butt cover, they admitted upfront and point-blank "Yer, we've deleted the production Postgres data directory, our backups don't work, we're seeing what we can salvage elsewhere."

    Yes, if you have copies of your production data
  • How do people still lose data in a time when so many options available to limit or even prevent it... synchronous or asynchronous replication to off site storage, snapshots, raid 6... we have the technology available to make data loss nearly unheard of... it's relatively easy to plan and implement, and it works... and yet morons everywhere STILL manage to lose data...

    • You shouldn't be surprised to see where Mr. Murphy plants his foot occasionally.
    • The problem is those are hardware fault solutions, not general solutions. Going bit by bit:

      "RAID is not a backup" is a mantra. Given that the data was deleted, not merely a physical disk failure, RAID did not mitigate it. It was successfully deleted across all disks. (unless one failed, but who cares?)

      They had replication to off-site storage. However, the deletion also replicated to the off-site location.

      They did have backups it seems, because they were able to roll back 6 hours to a restorable point in

  • Every X months or years someone can find some of the missing gitlabs data on a server somewhere. Just when you thought that was all they would recover, someone finds a few kilobytes of missing gitlabs data on an SD card floating in a sewer.

  • by wbr1 ( 2538558 ) on Thursday February 02, 2017 @01:33PM (#53788959)
    "wannabe"
    "pr flacks"
    number doubting '"less than 1% of our user base." But the firm's incident log says 707 users have lost data"

    Why the negative tone? I am not a coder. I do not use GitLab or GitHub except for an occasional download. However, generally competition is good. Sure this company lost data.. so do many. The real questions are is this indicative of a systemic issue or just a one time occurrence. I just don't see why this level of negativity is being pushed against this company.

  • reduce the urgency in a disaster by making the manner in which you would recover part of your daily routine - to whatever extant possible.
  • Am I the only one that read the title thinking the data was recovered?
  • It could have been far worse, and I imagine GitLab will make damned sure backups and suchlike work properly in future.

  • A company with 25 million VC bucks and customers like IBM, Redhat, and NASA doesn't have a working backup system? Let me guess, everybody at Gitlab is a developer, and the whole thing runs on node.js in Docker containers.
  • Well, I was involved verifying that we were in compliance. Over 100k products and some percentage were software, probably under 5%. A few projects were archived in the company archives. Funny coincidence I was there when the initial procedures were established. I didn't establish them, but I used them to archive a few software projects I was involved with. Bounce back to Y2K and I am requesting source code from several projects, now defunct but quite possibly will existing users. Simple, we will read the s
  • by Anonymous Coward

    What kind of IT organisation has $25 Million at their disposal, has a core business of looking after developers data and yet doesn't have snapshots and backups on that data? Seriously? In this day and age? Most modern filesystems have snapshot abilities and the ability to export those snapshots, wouldn't you then do an rsync or tape backup as a belt and braces thing? Also, check your backups, have backup monitoring in place, copy the data somewhere else as a DR plan, undertake test restores, copy important

Nothing will dispel enthusiasm like a small admission fee. -- Kim Hubbard

Working...