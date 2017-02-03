Microsoft Introduces GVFS (Git Virtual File System) (microsoft.com) 48
Saeed Noursalehi, principal program manager at Microsoft, writes on a blog post: We've been working hard on a solution that allows the Git client to scale to repos of any size. Today, we're introducing GVFS (Git Virtual File System), which virtualizes the file system beneath your repo and makes it appear as though all the files in your repo are present, but in reality only downloads a file the first time it is opened. GVFS also actively manages how much of the repo Git has to consider in operations like checkout and status, since any file that has not been hydrated can be safely ignored. And because we do this all at the file system level, your IDEs and build tools don't need to change at all! In a repo that is this large, no developer builds the entire source tree. Instead, they typically download the build outputs from the most recent official build, and only build a small portion of the sources related to the area they are modifying. Therefore, even though there are over 3 million files in the repo, a typical developer will only need to download and use about 50-100K of those files. With GVFS, this means that they now have a Git experience that is much more manageable: clone now takes a few minutes instead of 12+ hours, checkout takes 30 seconds instead of 2-3 hours, and status takes 4-5 seconds instead of 10 minutes. And we're working on making those numbers even better.
There aren't THAT many repos with over 3 million files in them.
The great majority of projects I've been on have been around the 100k-300k range and doing a build (to properly test the product) required ALL of them.
And even then, once you've got all of them the first time, GIT does the diffing automatically so it "scales" already.
Maybe MS could put some of their vast R&D efforts to to something more useful... like having their free Visual Studio Code editor handle files bigger than 1gb?
If your repo has 3 million files in it, you have bigger problems. Solving those seems better than trying to mitigate them.
And if you have a million [acm.org]?
I meant a billion in my other post.
Why all the hate? They had a repo with 3M files in it and they wanted to use GIT. They could have done this and not released it to the open source community. You don't have to use it, but isn't it better that they put it out there?
Microsoft's repos *are* that large. That's why they implemented this.
Microsoft Office's repository is over 1 TB in size. Yes, terabyte. For *office*. They absolutely cannot (could not, I suppose now) use Git on it.
Why are they that large in the first place?
Do they also store all design files and compiler-generated files in the repo?
Did they just turn git into svn? (Score:4, Insightful)
The whole point of git is that you have identical copy on your machine. Why take away git's biggest advantage?
Because it's biggest advantage is also one of it's greatest inefficiencies and frankly on a large project chances are you may not need it all. The whole point is you have an identical copy on your machine of what you're working on
When I use svn I have a copy of my branch on my local machine. I may not have every other branch or every part of the repo, but I have what I'm working on. I'm not sure what this is for other than companies that can't find a way to partition their version control between products.
Ah nostalgia (Score:3)
While a vfs sounds like a great idea, I think in theory it's only of use for very, very large repos. Even then I wonder if the exact same issues that made Clearcase suck would make it suck even with Git.
Then you had a piss-poor release engineer who didn't understand how to construct config specs based on a stable baseline, label & promote stable builds regularly, and use clearmake properly, or manage dependencies and allow you to do a clean, fast local build.
I love git, and I work with it daily, and the mo
The fact you needed a release team and release engineers to manage a clear case implementation is why its considered one of the worst systems out there, remembered with hatred by almost everyone who used it. A version control system should be easily set up by one admin in an hour or two, and then usable without reams of documentation by any of the engineers. ClearCase failed that.
I had to use Clearcase as my source control system for one company I worked for. The idea was you set up a view spec (a bit like a branch), mapped a drive letter to it and you never had to pull again because it would always reflect that branch. Your local changes went over the top and when it was time to commit you could merge up and commit. In practice what it meant was the source code was constantly changing under your feet, and binaries were constantly stale or in a mystery state because you didn't know what they were compiled against. And because this was IBM software it was unusably slow across WANs, memory hungry and enjoyed triggering random blue screens.
While a vfs sounds like a great idea, I think in theory it's only of use for very, very large repos. Even then I wonder if the exact same issues that made Clearcase suck would make it suck even with Git.
To be fair to IBM, ClearCase had this behavior before the three mergers that made it part of IBM. (Pure + Atria -> PureAtria, PureAtria + Rational -> Rational, IBM + Rational -> IBM)
I actually liked the concept of "wink-in" where derived objects that came from the same source objects and build environment could just be pulled from someone else's build instead of rebuilt. But the system as a whole required a zippy network.
I don't hold out hope that a vfs on top of another scm solution would be eve
