GitLab Says It Found Lost Data On a Staging Server (theregister.co.uk) 101
GitLab.com, the wannabe GitHub alternative that went down hard earlier this week and reported data loss, has said that some data is gone but that its services are now operational again. From a report The Register: The incident did not result in Git repos disappearing. Which may be why the company's PR reps characterised the lost data as "peripheral metadata that was written during a 6-hour window". But in a prose account of the incident, GitLab says "issues, merge requests, users, comments, snippets, etc" were lost. The Register imagines many developers may not be entirely happy with those data types being considered peripheral to their efforts. GitLab's PR flaks added that the incident impacted "less than 1% of our user base." But the firm's incident log says 707 users have lost data. The startup, which has raised over $25 million, added that it lost six hours of data and asserted that the lost doesn't include users' code.
Re: (Score:3)
It's a server (or set of servers) where you stage a new release of your site/software before an actual production release - it provides an environment as similar to prod as possible, and the idea is to help test test your release before unleashing it to the world.
Re: (Score:2)
From what i gathered from the obscurely worded article, it seems that they tried to restore data from their staging server after their five backup systems failed. Staging servers require production-like data so it is common to keep them somehow synchronized with prod data (a database copy, for example), but it is kinda sad that's the only thing they had left by then.
Re: (Score:2)
Or GitLab, even.
Re: (Score:2)
Re:Live by the cloud, (Score:5, Insightful)
Re: Live by the cloud, (Score:1)
You can self host your own gitlab server and handle backup yourself if you want.
Re: (Score:2)
Why the hell would you "self-host" a cloud service?
I switched from using DropBox in the cloud to a FreeNAS file server at home since I rarely access those files over the Internet. Now I don't have to worry about losing my data via the Internet.
Re: (Score:1)
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
Have you ever had to do a bare metal restore?
If not I suggest you do it. That's part of what got GitLab, they never verified their backups were restorable. If they had they'd have found there was no data there.
Re: (Score:2)
Do you work or possibly, formerly worked, at Gitlab?
Re: Live by the cloud, (Score:4)
Why the hell would you "self-host" a cloud service?
Almost any server can be "cloud service". There are several interesting solutions to the problem "I need to access a Git repository over the net" in "the cloud" or otherwise. For example, I self host because my code is so amazing, I can't risk having anyone see it lest they die from heart attack due to the overwhelming splendor.
Re: (Score:2)
I self host because my code is so amazing, I can't risk having anyone see it lest they die from heart attack due to the overwhelming splendor.
Best reason ever.
Re: (Score:2)
Re: Live by the cloud, (Score:4, Informative)
Being it a Git repository, you don't have to worry too much about your "centralized" hosting provider – Each developer that has cloned a (non-shallow) repository will locally have everything needed to rebuild history were both providers to disappear. Git is a great backup strategy by itself :-)
Re: (Score:3)
Of course, forgot to add — this will *not* include comments, issues, the whole social ecosystem built around your code — but anyway, you don't get to backup it if you replicate your project over several different Git-hosting providers.
Re: (Score:2)
"Private cloud" means you lease a VPS, such as an AWS EC2 instance, and install an application there. It's useful for keeping personal information within your own country.
Re: (Score:2)
Isn't the entire point of the "cloud" being that they take care of that crap for you?
It is a selling point, certainly. I wouldn't say it is the entire point.
Other selling points are:
1. Access to the data from anywhere
2. Collaboration with internal and external users
3. Cross platform availability (device agnostic)
4. Simplified billing / accounting
5. Broader spectrum of tools (example: you could buy just Word for, say, $100 and own that one program or you can get an O365 sub and rent SharePoint, Word, Excel, Outlook, Publisher, Access, Skype, Exchange, PowerBI, OneDrive and a raft of other so
Re: (Score:2)
There is no cloud...it's just someone else's computer. If you're not comfortable with your stuff on someone else's computer, that would be good justification for self-hosting. I have a FreeNAS box at home providing ownCloud, Plex, and some other services, as well as some Git repositories (currently without a web interface). Some of my Git repos (especially my Portage overlay) are at GitLab for public access (used to be at GitHub, but I yanked everything
Re: (Score:2)
Why the hell would you "self-host" a cloud service?
Because in today's modern world, it pays to be fully buzzword-compliant.
Re:Live by the cloud, (Score:5, Informative)
GitLab is actually quite good at it, really.
1. You can get all the wiki and code repo data by git cloning into a backup repository.
2. You can set up a remote mirror that gets automatically updated for the code. I don't think you can do that for the wiki, though.
3. Project admins can download a metadata dump to import in some other gitlab instance (e.g. a local instance of gitlab CE (floss) or EE (paid):
The following items will be exported:
Project and wiki repositories
Project uploads
Project configuration including web hooks and services
Issues with comments, merge requests with diffs and comments, labels, milestones, snippets, and other project entities
4. The data which is not exported (LFS objects, build traces and artifacts, container registry images) can be downloaded in some other way. E.g. LFS is usually cloned along with the git code repos.
Note that (3) **includes** the webhooks data that was not fully recovered.
So, yeah, anyone who lost truly important data in this gitlab.com event was actually just as guilty of not following the "Tao of Backup" properly as gitlab.com's sysadmins.
Re: (Score:2)
Re: Live by the cloud, (Score:2)
The worst offender is Apple's iCloud, IMHzo. Backup your photos onto your own drive: I can offers you hoops and dead-ends. I really feel cloud services should provide easy options.
Re: (Score:2)
There is a concept called various things, but most often "vendor lock in" ; it may limit your potential market to the idiots in your industry, but if you can get those idiots to accept it, you're on a road to permanent customers, plus they'll send their first born daughter (or son - your choice) round to service you when you want to empty your balls.
Did you never see that big cheese-eating grin on Billy Gates face? Nerd paradise through vendor lock-in.
Too Late (Score:1)
Reputation is ruined forever. Everyone involved will never work in tech again, should kill themselves right now.
Re: (Score:3)
On the other hand, having now made this mistake they are probably not going to make it again. Could be more reliably than companies which by chance have not needed to restore from backup yet.
Re: (Score:3)
Sounds like they restored from a backup. Backups are generally taken once per 24-hours, although PITR on databases is ... interesting, and complex as hell to pull off in the real world (I don't know why; it should be a simple operation, but no database seems to make it as easy as "look here for alternate binary logs and play forward until $TIME").
Data loss of 6 hours of issues, MRs, comments, and the like is ... data loss of 6 hours. It's a lot in aggregate for something with over 70,000 users and 238,
Re: (Score:2)
Re: (Score:2)
Exactly, I do this all the time in MS SQL to troubleshoot application bugs by creating a DEV copy to a specific point in time.
Re: (Score:3)
That's my experience as a backup provider (Score:3)
That matches my experience. My company offers an offsite, bootable backup solution so if anything bad happens to your server, you just boot the appropriate clone in our cloud and you're back in business. A LOT of our customers get our service when they find out the hard way why *proper* offsite backups are important. Many weren't too concerned about backup and business continuity until something bad happened to them.
AFTER they have a major loss they get serious about making sure it won't happen again.
PFY ... (Score:4, Funny)
It's GIT for god sake (Score:1)
Of course it doesnt include users code - it's GIT for god sake. Developers have the whole repo on their own machine...
We lost your data but we're back up and ready (Score:3, Insightful)
"wannabe GitHub alternative" ? (Score:4, Insightful)
"GitLab.com, the wannabe GitHub alternative" ... Uhm, is that really accurate?
Re:"wannabe GitHub alternative" ? (Score:5, Informative)
A "github clone" which comes with a CE edition which is FLOSS, and an EE edition, for either zero-cost (CE edition), or just $ (EE edition). And in both cases, you can have your own on-premises. github would be $$$, and I don't think it does on-premises (but even if it does, it is a lot more expensive).
It is also vastly preferred over github by anyone with small teams. It didn't get into fortune-500 by chance, nor did it get US$ 25M in funding by chance.
But yes, if you hate github's usability or flows, there is no reason to believe you wouldn't hate gitlab as well. They are *not* the same, but they're close enough.
Re:"wannabe GitHub alternative" ? (Score:5, Interesting)
Re: (Score:2)
The most significant for me is the integrated CI, and that you can host your own runners and workspaces on your own infrastructure (or some cloud provider).
Can you clarify this? What are runners, and what does it mean to host a workspace? Is that like an Eclipse workspace, or something else?
Re: (Score:2)
Re: (Score:2)
Runner are part of GitLab CI tools. They are daemons that you can host on your own infrastructure in order to run your automated builds or deployments. Details of GitLab CI [gitlab.com]
Not sure if this is reassuring... (Score:2)
So they have found the data randomly on a server somewhere.
Bad incident; great response (Score:5, Interesting)
Obviously, data loss is embarrassing. I think we all appreciate the importance of not only having multiple backups, but testing to ensure that your backups work, and are sufficient to fully restore operations. GitLab is just the latest in a long tradition of sites and services that have found themselves facing the consequences of not regularly testing their recovery plans.
But I do respect their response. They quickly recognized what had happened, and they diagnosed what went wrong with their backups. They did not try to use PR-speak to conceal their mistake -- they publicly copped to it, in plain industry-standard language that their users would understand, and even offered a livestream of their team resolving the issue. I think this has been a masterclass in how to recover from a blunder. I bet you that this is not a mistake GitLab will be repeating anytime soon.
Also, I think it's very fortunate that they're in the git repo business, and presumably users who had data that was affected by the loss still have a copy in their own local repos. Thank god for distributed SCM.
Re: (Score:2)
Thank god for distributed SCM.
Considering that the particular SCM software in this story is Git, you should probably be thanking Linus Torvalds.
On second thought, he might enjoy being called god. Carry on.
Re: (Score:2)
Obviously, data loss is embarrassing. I think we all appreciate the importance of not only having multiple backups, but testing to ensure that your backups work, and are sufficient to fully restore operations. GitLab is just the latest in a long tradition of sites and services that have found themselves facing the consequences of not regularly testing their recovery plans.
But I do respect their response. They quickly recognized what had happened, and they diagnosed what went wrong with their backups. They did not try to use PR-speak to conceal their mistake -- they publicly copped to it, in plain industry-standard language that their users would understand, and even offered a livestream of their team resolving the issue. I think this has been a masterclass in how to recover from a blunder. I bet you that this is not a mistake GitLab will be repeating anytime soon.
Also, I think it's very fortunate that they're in the git repo business, and presumably users who had data that was affected by the loss still have a copy in their own local repos. Thank god for distributed SCM.
They claimed they did not lose any Git data, only database records pertaining to users, issue tracking, tasks, etc. I don't know of anyone who backs up their bug tracking and other databases, so some people probably would have preferred to have lost their git data. It's easier to restore on an active project.
HAH I called it. (Score:1)
Process for updating any stage:
1. copy data from next stage.
2. deploy new code
3. TEST TEST TEST
I Was Impressed (Score:2)
Yes, if you have copies of your production data
How? (Score:1)
How do people still lose data in a time when so many options available to limit or even prevent it... synchronous or asynchronous replication to off site storage, snapshots, raid 6... we have the technology available to make data loss nearly unheard of... it's relatively easy to plan and implement, and it works... and yet morons everywhere STILL manage to lose data...
Re: (Score:1)
Re: (Score:2)
The problem is those are hardware fault solutions, not general solutions. Going bit by bit:
"RAID is not a backup" is a mantra. Given that the data was deleted, not merely a physical disk failure, RAID did not mitigate it. It was successfully deleted across all disks. (unless one failed, but who cares?)
They had replication to off-site storage. However, the deletion also replicated to the off-site location.
They did have backups it seems, because they were able to roll back 6 hours to a restorable point in
This could be a thing (Score:2)
Every X months or years someone can find some of the missing gitlabs data on a server somewhere. Just when you thought that was all they would recover, someone finds a few kilobytes of missing gitlabs data on an SD card floating in a sewer.
Why the axe to grind? (Score:5, Insightful)
"pr flacks"
number doubting '"less than 1% of our user base." But the firm's incident log says 707 users have lost data"
Why the negative tone? I am not a coder. I do not use GitLab or GitHub except for an occasional download. However, generally competition is good. Sure this company lost data.. so do many. The real questions are is this indicative of a systemic issue or just a one time occurrence. I just don't see why this level of negativity is being pushed against this company.
if your backup is utilitarian, its not a big deal (Score:1)
Found Lost Data (Score:1)
I guess the good news... (Score:2)
It could have been far worse, and I imagine GitLab will make damned sure backups and suchlike work properly in future.
Get off my lawn (Score:1)
Remember Y2K goofiness? (Score:1)
$25 Million and no backups? (Score:1)
What kind of IT organisation has $25 Million at their disposal, has a core business of looking after developers data and yet doesn't have snapshots and backups on that data? Seriously? In this day and age? Most modern filesystems have snapshot abilities and the ability to export those snapshots, wouldn't you then do an rsync or tape backup as a belt and braces thing? Also, check your backups, have backup monitoring in place, copy the data somewhere else as a DR plan, undertake test restores, copy important