Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Programming Microsoft

An Update On Microsoft's 'GitHub Arctic Vault Program' (news.com.au) 31

news.com.au reports: The GitHub Arctic Vault program is part of the now Microsoft-owned code repository GitHub...aimed at preserving the information for generations to come...

"We chose to store GitHub's public repositories in the Arctic World Archive in Svalbard [a Norwegian island] because it is one of the most remote and geopolitically stable places on Earth and is about a mile down the road from the famous Global Seed Vault," said GitHub vice president of special projects Thomas Dohmke. Mr Dohmke said open source code in particular was worth preserving... "Ultimately, it's time to create multiple durable backups of the software our world depends on..." Other treasures include the original source code for MS-DOS (the precursor to Microsoft Windows), the open source code that powers Bitcoin, Facebook's React, and the publishing platform Wordpress...

"The Arctic Code Vault was just the beginning of the GitHub Archive Program's journey to secure the world's open source code," GitHub vice president of special projects Thomas Dohmke told news.com.au. "We've partnered with multiple organisations and advisers to help us maximise the GitHub Archive Program's value and preserve all open-source software for future generations." One of those partners is Norwegian archival experts Piql, who specialise in very-long-term data storage. The company uses around 200 silver halide and polyester film reels designed to last a thousand years to store the information...

This discussion has been archived. No new comments can be posted.

An Update On Microsoft's 'GitHub Arctic Vault Program'

Comments Filter:
  • MS-DOS, Bitcoin, and React???? These are the important pieces of software that humanity will care about after a nuclear apocalypse? Yeah right...

    • The Eloi will surely appreciate it.
    • Wait a minute! What about cute cat videos? After the apocalypse, the most important thing would be to lift people's spirits, hence cats on Roombas would be very high on the list of priorities.

      • Github is mostly source code, not binaries. Preserving it requires complete digital precision, especially if we want to preserve the history. Most video or audo archival means can survive a few percent of the data being erratically lost. Digital source code is not so tolerant.

        • Digital source code storage can trivially be made error-tolerant, simply by storing multiple copies with checksums (or, if you want to be more space-efficient, by using an equivalent but more sophisticated encoding like forward error correction). Of course that increases the amount of storage space youâ(TM)ll need, but compared to the storage requirements for video, even the most wasteful redundancy would be small.

          That said, cockroach-DNA storage is the most robust solution. You wonâ(TM)t be abl

          • > Digital source code storage can trivially be made error-tolerant, simply by storing multiple copies with checksums (or, i

            Simply put, checksums do not prevent "rm -rf /". Check summing and snapshots re valuable against certain classes of failure, not against all. Nor do checksums prevent obsolence or failur of the systems used to _read_ from older media, especially if hte data is stored encrypted and the key is lost or the index to the archival system is lost.

            DNA storage is not digitally robust. It's hi

    • by im_thatoneguy ( 819432 ) on Sunday March 01, 2020 @08:21PM (#59785670)

      I know nobody reads the fucking article but...

      The 02/02/2020 snapshot archived in the GitHub Arctic Code Vault will sweep up every active public GitHub repository, in addition to significant dormant repos. The snapshot will include every repo with any commits between the announcement at GitHub Universe on November 13th and 02/02/2020, every repo with at least 1 star and any commits from the year before the snapshot (02/03/2019 - 02/02/2020), and every repo with at least 250 stars. The snapshot will consist of the HEAD of the default branch of each repository, minus any binaries larger than 100KB in sizeâ"depending on available space, repos with more stars may retain binaries.

      They just listed some of the best known repositories.

      dotnet/core
      torvalds/linux
      python/cpython
      bitcoin/bitcoin
      rails/rails
      docker/machine
      openssl/openssl
      nodejs/node
      Homebrew/brew
      php/php-src
      twbs/bootstrap
      microsoft/TypeScript
      apache/hadoop
      v8/v8
      Alamofire/Alamofire
      gatsbyjs/gatsby
      fastai/fastai
      jimweirich/builder
      zeit/next.js
      WordPress/WordPress
      rust-lang/rust
      golang/go
      angular/angular
      jquery/jquery
      ruby/ruby
      facebook/react
      CocoaPods/CocoaPods
      jupyter/notebook
      zeromq/libzmq
      postgres/postgres
      microsoft/MS-DOS
      Netflix/chaosmonkey
      robbyrussell/oh-my-zsh
      xamarin/xunit
      grafana/grafana
      graphql/graphql-js
      github/gh-ost
      rspec/rspec
      libgit2/libgit2
      Many more

    • MS-DOS is a very important part of our history. PC and DOS were practically synonymous in the 1980s and the early 1990s.
  • by 93 Escort Wagon ( 326346 ) on Sunday March 01, 2020 @03:44PM (#59784944)

    This smacks of "publicity stunt". Storing physical seeds in the arctic makes some sense - but there's nothing magical about storing data in a cold, remote location versus somewhere else.

    Additionally, this source code is already mirrored in numerous locations around the world. If some global cataclysm was ubiquitous enough to somehow destroy all those copies... it is unlikely any survivors will care in the least about setting up a new Wordpress server.

    • by Brett Buck ( 811747 ) on Sunday March 01, 2020 @03:49PM (#59784956)

      What about my critical Libra holdings? I might need to buy food, water, and shelter, how will I access those without this very important archive?

    • Incoherent, unmerged, forked, and obsolete clones of git repos scattered around the world are vulnerable to profound "split brain" merge difficulties. This is especially true for the "laptop" environments which developers never remerge from upstream and do local work on. Github has become a critical reference site for the Linux kernel and for git software itself at https://github.com/git/git/ [github.com] github.com is potentially vulnerable to flushing, to people accidentally or deliberately deleting the back-end stora

    • imo the bigger issue is the actual storage media. The seed vault could be useful thousands of years from now just as a genetic repository, but the best extant storage media (aside from literally printing onto gold records) have a shelf life of about 100 years (on the very high end.)
      • From the article:

        "One of those partners is Norwegian archival experts Piql, who specialise in very-long-term data storage. The company uses around 200 silver halide and polyester film reels designed to last a thousand years to store the information... "

        With that said, a thousand years ain't shit. (But I am looking forward to Firefox version 3284751063 and Windows Eleven Billion.)

  • As soon as they have all the Open Source code locked away, they getting ready to wipe us out.

    • They're holding the open source world hostage by deep freezing our seed. M$ wants money, not to blow everything up. They'll just use it to extort everyone of the world supply of soy sauce and cocaine.
  • by enigma32 ( 128601 ) on Sunday March 01, 2020 @04:08PM (#59785006)

    React can't even keep their core design patterns consistent for a couple years at a time. What benefit at all [besides marketing BS] could there be to putting the source code into a vault like this for potentially thousands of years?

  • “I say we take off and nuke the entire site from orbit. It's the only way to be sure.”
  • Nobody competent is on GitHub anymore since MS bought it. And none of that code is of any use anyways without the machines to execute it. And if computers have to be re-developed, it will be easier and cheaper to write what software is needed instead of getting some ancient, probably bug-ridden stuff to work again.

    • Are you under the impression that gitlab, sourceforge, butbucket, or any other commercial or personal git repository will be stable as a corporation for the next 100 years?

  • ... a story needs this:

    ... MS-DOS (the precursor to Microsoft Windows) ...

  • ... with the speed increases and storage capacity we have now, we'd be slicker'n mockingbird shit on a sycamore limb.

  • When the aliens land and find the Arctic Vault, they be wondering what the hell "Tetris" was and why it was so fucking important to preserve it for millennia. "It must have been important, look at all the trouble they took to preserve it!"

  • if someone posts something like wikileaks to GitHub, won't that make this vault a political target?
  • You never know. Maybe 2020 is the year Giant Meteor will win the US election.
  • Most of the important code needed to reboot the economy is still written in Cobol.

    And if the Javascript code suffered from bitrot, would anyone even notice?

  • Always just in time...

The unfacts, did we have them, are too imprecisely few to warrant our certitude.

Working...