Forgot your password?

Linus on GIT and SCM 392

Posted by kdawson
from the strong-opinions dept.
An anonymous reader sends us to a blog posting (with the YouTube video embedded) about Linus Torvalds' talk at Google a few weeks back. Linus talked about developing GIT, the source control system used by the Linux kernel developers, and exhibited his characteristic strong opinions on subjects around SCM, by which he means "Source Code Management." SCM is a subject that coders are either passionate about or bored by. Linus appears to be in the former camp. Here is his take on Subversion: "Subversion has been the most pointless project ever started... Subversion used to say, 'CVS done right.' With that slogan there is nowhere you can go. There is no way to do CVS right."
This discussion has been archived. No new comments can be posted.

Linus on GIT and SCM

Comments Filter:
  • Re:git (Score:5, Interesting)

    by Anonymous Coward on Saturday June 02, 2007 @10:36PM (#19367765)
    He is only human. Just because he is the head of a huge software project doesn't make him infallible.

    Just look at the whole 'RMS vs Linus' thing.

    His opinions should carry some weight, especially since he should know more than anyone what the limitations of SCM software is when it comes to larger projects like the linux kernel. But a lot of SCM comes down to the way a project is managed, the preferences of the people involved, and how they deal with their project. I doubt there is a blanket solution... a 'one SCM package to rule them all' so to speak.

    Especially in the software industry you can always find someone just as good as yourself that strongly holds opinions that are the polar opposite of yours.
  • by RootsLINUX (854452) <rootslinux AT gmail DOT com> on Saturday June 02, 2007 @10:39PM (#19367781) Homepage
    I've used CVS, SVN, and GIT in serious projects and I can say I far prefer SVN to GIT, and GIT to CVS. GIT was incredibly confusing to use, and it may just have been the way the repository was administered was poor, but I never knew if I was synched with everyone else's checkouts and the command names made no sense. Its been over a year so I don't remember the details of GIT, but I remember having to do a lot of things "twice". Need to do a checkout? Two commands. Need to commit? Two commands. It was a bitch to use and I am glad I'm done with it. SVN, on the other hand, I felt very comfortable with from the start and most important of all, I trusted SVN to do what I wanted it to and to keep me from screwing up. In a year of using it, it has failed to lose my trust.

    I'm not trying to say SVN is better than GIT. The best repository depends on the type of project and type of development. But defaming SVN in favor of GIT is not, I believe, a valid statement. Especially when (I'm pretty certain) many, many more projects use SVN rather than choosing to use GIT.
  • My favorite, of course, is Mercurial []. My main draw is that I had been interested in distributed SCMs for years, but had never found one that made any sense to me whatsoever. I was on the hunt again and stumbled on Mercurial, and I've been hooked ever since.

    Of the various distributed SCMs, Mercurial is the easiest to use one I've found. And it's pretty fast, though not quite as fast as git (though I have some ideas on how to fix that). And since it's written in Python with only a very small C component it runs on many platforms.

  • by Anonymous Coward on Saturday June 02, 2007 @11:01PM (#19367891)
    I took a look at git a while ago and was completely underwhelmed. The UI was so bad it was useless, and it didn't "seem" to do anything that Darcs didn't do. (I used to love Darcs because of the automatic patch dependency computations).

    Now that all the "next generation" SCM tools have matured somewhat, I took a look at all of them again. I had to stop using Darcs because of the "patch of death" problem, which basically is this: after using Darcs on a project with long-lived parallel branches, the repository may eventually enter a wedged state you can't get out of, due to exponentially complex patch dependencies. Oops.

    At this point I had an idea of what an SCM should do, how it should work, what the "mental model" should be. I want to create changesets, add them to branches, combine multiple branches (and keep track of renames and so forth between branches), re-order changesets, collapse multiple changesets into one, discard old branches, etc.

    Of course, CVS and close cousin Subversion are SO UTTERLY USELESS I didn't even consider them. Seriously, Subversion is like gold-plated shit. Looks nice but it's still shit. Reading people say stuff like "Subversion is awesome" makes me wince. How can something that doesn't have "real" branches, and doesn't have tags OF ANY KIND, be useful for anything? How do you keep track of multiple merges between branches? Answer: you don't. Or you keep track of revision numbers using svnmerge and pray it all works. Even the Subversion docs sortof hand-wave this away. I.e., they hand-wave away one of the FUNDAMENTAL ASPECTS of source code management: branching and merging. It's like hearing people talk about OO databases. They mean well but they just don't comprehend the generality of the underlying problem.

    That's why I was so excited about Darcs: the author "gets it". Unfortunately the implementation is flawed.

    I checked out a few more (Mercurial, bzr) but finally settled on git because it let me do all the things I needed to do, and it did them FAST. Once I figured out the underlying model I was pretty impressed. Git can be viewed at many levels: very low-level plumbing, or UI-level, or in between. The UI and documentation is still pretty shitty, but thankfully they are working on improving it and are moving away from the idea of having interchangeable UIs. Just focus on improving "core git".

    One great thing about git is that so much of it is just files in the .git dir and shell scripts that combine very simple low-level functions. For instance, you can create a branch just by saving the SHA1 ID of the tip into a file in .git. You can branch off any point in the history this way, including branches you've deleted in the past (git keeps all the old commit objects by default, even ones that aren't pointed to by any branch or tag.. this is very simple and understandable model, like reference-counting in a way).

    The other great thing about git is how easy it is to sling changes around and reorder them and combine them. For instance let's say you add a file to your project as commit "A". Then you add some code that uses this file as commit "B". Then you fix a bug in the file as commit "C". So you have A-B-C. Now you'd like to combine A and C into a single patch A', and put B on top of it, like this: A'-B. In git, this is super-easy. I can think of two ways to do it off the top of my head.

    I was checking into a CVS project the other day (for a client) and wanted to do this. Then I realized, you can't move things around in CVS like this *twitch*. So nowdays I do everything in git and only after the changes are beautiful and self-contained and well-commented do I check them into CVS one at a time.

    Okay so they point is, check out git (or honestly? Checkout out ANYTHING that isn't CVS or svn). Even if you think Linus is an asshole (which he is) or you don't like the git UI (it's not that bad now), check it out anyway.

    And if you don't use SCM at all? You suck. Start learning. It's a best practice that you can't live without, once you start.
  • by ClosedSource (238333) on Saturday June 02, 2007 @11:29PM (#19368029)
    If you have a project that has thousands of developers all of the world like Linux does, a SCM system that is focused on merging makes a lot of sense. Unfortunately, there is a tendency for some people to overdo merging on small projects when they don't really need to. If the application is designed in a modular fashion and developers are assigned specific modules, than merging is rarely needed. Of course, many control freaks don't like this approach because it makes it harder for them to "correct" other developer's code.
  • by ClosedSource (238333) on Saturday June 02, 2007 @11:49PM (#19368133)
    You may be saying this in the context of large FOSS projects, but for most projects, not allowing all the team members to commit changes seems like a really bad idea. If you don't trust them, why are they on your team?

    Complaining about the occasional inefficiencies of file locking while forcing some developers to waste time waiting for permission to commit, seems really ironic to me.
  • by starseeker (141897) on Saturday June 02, 2007 @11:56PM (#19368161) Homepage
    I tend to agree - what becomes the "official" code (i.e. what would go into a release tarball) is a social problem without technical solution. A coordinated release requires AGREEMENT, however that agreement is arrived at.

    What GIT does differently, as I understand it, is it makes flipping around branches much easier than before. CVS and SVN have the concept of a central server, so if two developers are trying to resolve differences in their branches before either can get their changes into the main tree they have to work outside svn/cvs to communicate those changes to each other. With GIT, both developers can set up their individual archives and pull from each other, without ever involving the main tree. In other words, the benefits of version control and branch control are available between any two individuals with repositories, without relying on the main branch.

    GIT also makes it trivially easy for everyone to switch away from the "official" branch to someone else's as the standard, but that begs the question of resolving differences WITHIN the project.

    GIT is a neat tool, and I think it has a lot of potential. But like every other technological solution, it does not and cannot resolve fundamentally social issues.
  • It is moving the problems to a neighboring problem-space, but that allows for a good benefit: getting everybody to check trivial changes in.

    (at least, that's what I got from his talk)

    I know I've seen it before - the problem where commits are restricted by management (for good reason), and people cannot commit their current work. I've seen this destroy some work before, as it means everybody is basically always running with a 1-2 week window of changes that are not checked in to the "safe, backed up". Ouch.

    If you defer the problem a bit, so everybody can commit changes all the time, that helps keep everybody on "good practices". Also, he mentioned a workgroup situation: if I commit changes to a local repository, I can give the other people in the repository easy, save access to it without having to mess with the main branch.

    One important change, though, is the direction of the conflict. CVS/etc uses a "push" model, where developers have to push their changes to the server. GIT (in theory) works as a "pull" model, where a manger could pull the changes the developers have made. I kindof like that idea, as it means conflicts are in the arena of the manager, in theory freeing the developers from some of the mess.

    at least, that's as I understand it...
  • by lorcha (464930) on Sunday June 03, 2007 @02:36AM (#19368785)
    subversion lacks multiple merge points.

    Making a single branch and then a single merge is trivial in subversion. Doing anything more complicated is a nightmare.
  • Re:git (Score:4, Interesting)

    by Anonymous Coward on Sunday June 03, 2007 @03:38AM (#19369071)

    I was at the talk and I have to say he lost a HUGE amount of respect from me (and other people in the room whose job has to do with source control).

    The way git works as a decentralized solution with a chain of trust is simply not useable for really large, multiple projects with interdependencies. And it's even worse when you need to control access to certain portions of the code.

    I see Git as a pyramid scheme [] with Linus sitting on top. I can't start imagining the job of the poor release engineer in a big corp who would need to merge the changes of sub-engineers and the chain of trust involved to reach the top ! What I see is that everyone would code and test on out of sync code, a bit like Vista's development was.

    Git is a solution that is fine tuned to Linus specific needs, but it's ages away from a solution that's flexible for most of the industry's needs.

    I'm a big fan of subversion, and while I'll admit it's far from perfect it's way better than cvs could ever be. It does the job well most of the time, and SVK [] is filling some of the holes.

  • by SnowZero (92219) on Sunday June 03, 2007 @04:14AM (#19369223)

    I use SVN if a medium sized team and see SVN used extensively in all kinds of projects around the globe with great success. I personally love the workflow of SVN.
    If all you've ever known is centralized version control, you don't know what you are missing. Having used *both* centralized and decentralized version control on the same projects, I can say that decentralized wins hands down, but you have to work with it for a while to truly appreciate it.

    The only thing that they need to work is merging of branches, and incidentally I've talked to the developers, they're quite aware of this flaw of SVN and working on it. We'll see new versions that can track changes in each branch and even attempt automated merges with good success.
    If you take branching and merging to its logical limit, you end up with distributed version control. It's far easier to design with that in mind from the beginning. SVN has probably sunk more developer hours than darcs, mercurial, and monotone combined, which is a shame when you consider those three later projects have superior branching, merging, and disconnected operation than SVN. Git, while powerful, is highly adapted to the Linux kernel workflow, so many might not find it easy to use. However, there are several good distributed VCS options to choose from now, and there is a good option available for just about any project.
  • Re:Why winge? (Score:3, Interesting)

    by DrXym (126579) on Sunday June 03, 2007 @04:51AM (#19369353)
    I don't think there is anything especially wrong with Subversion. Sure it doesn't support changesets which is an very handy feature if you're juggling lots of checkins, or your role is release engineer but it is possible to work without them, e.g. using patches and atomic fileset commits. I used CMVC with changesets and it is useful for release managed projects but its bit of a pain for casual or self-managed code.

    Other features such as replication would also be useful if svn were a slug but it isn't. Some source control systems such as Clearcase are so badly designed that replication is essential because of the bursty traffic but svn seems to run superbly even across the internet. Subversion is also cross-platform and runs anywhere which in itself is a massive bonus.

    I would like to see git be used more but it needs to be properly cross-platform with a front-end akin to TortoiseSVN and plugnis for all major development environments before that is likely to get the attention it deserves.

  • by suv4x4 (956391) on Sunday June 03, 2007 @06:46AM (#19369817)
    Centralization of the VCS itself has little or no effect on this. How would a distributed VCS inhibit your team from acting this way?

    I said we work with plenty of binary files that can't be merged, hence they have to be locked. You can't lock a file if there's no central place where you lock it.

    Again, SVN and GIT are just two different approaches that work for different type of projects. The projects I work on are 30% design, and as such I want to give my designers webdav access or at least visual GUI that's very easy to understand and use. I'm not making kernels, and I don't work alone.

    So propositions like "you need to spend lots of time working with it to appreciate" is a deal breaker for me.

    Even if GIT has superior features, features isn't the only thing I'm interested it, but the overall package.
  • Re:git (Score:3, Interesting)

    by Anonymous Coward on Sunday June 03, 2007 @07:03AM (#19369889)

    The thing is, Linux is actually a pretty small project. Much larger projects would include FreeBSD...
    That is nonsense. linux- is 5.3 million lines of code. FreeBSD HEAD for /src checked out today is 6.2 million. So Linux is not a small project by any measure, and FreeBSD is not that much larger. Note that the FreeBSD number includes contrib which has copies of gcc, gdb, sendmail, bind, etc. For comparison, all of Debian Etch is 283 million lines of code.
  • Re:Why winge? (Score:2, Interesting)

    by DaveHowe (51510) on Sunday June 03, 2007 @07:42AM (#19370065)
    Isn't that pretty much the same reason he wrote linux in the first place? :)
  • by slipsuss (36760) <sussman AT red-bean DOT com> on Sunday June 03, 2007 @08:06AM (#19370183) Homepage
    Ignoring Linus' heinous unprofessional attitude, massive ego, and completely insulting comments, there's a lesson to be learned here: you and your team need to decide whether you want centralized or decentralized version control. There are advantages and disadvantages to both methodologies. Anybody who gets up on a stage and tells you that "all centralized systems are garbage, decentralized is the one true way" isn't giving you the full picture. (And likewise, anyone who says the opposite is equally off their rocker!) 80% of software development takes place within corporations, and there's a reason centralized SCM has worked so well in that environment. Decentralized systems might be great for certain open source communities, but it's not what most organizations want or need. If you'd like another viewpoint on why centralized might sometimes be better than decentralized (even in open source projects), take a look at this essay [] I wrote a while back.

    I'm one of the original designers/developers of Subversion, and even we (in the svn developer community) are well aware of both sides of the coin. We're seriously considering adding decentralized features to svn 2.0. We've also added true merge-tracking magic to the imminent svn 1.5 release (so svn is no longer "hand waving" merges, they'll be just as simple as in decentralized systems.)

    If you truly believe that distributed SCM is the the Only Way of working in all situations, then I suggest you try to push these systems on corporate teams, and see how they fare. Distributed systems have a model that's much more complex for the average joe-user to understand, and as a result most existing distributed systems have extremely complicated UI's. If they're complex enough to confuse open source nerds, think about the rest of the world's programmers...

    Keep an open mind about this stuff. No matter what Linus says, there's no magic SCM bullet.
  • SVN etc. (Score:3, Interesting)

    by Tom (822) on Sunday June 03, 2007 @03:22PM (#19373281) Homepage Journal
    He's right about CVS, and more or less about SVN. Except for one thing: Subversion works. Not only in the technical sense, but in the sense that you can work with it, you can easily explain it to new developers, there is integration into lots of IDEs, code editors and other tools and the list goes on and on. (last, but not least: Trac!).

    I used to be passionate about arch, for example. I'm fairly sure I would've been about GIT had it existed back then. But then I learned that to get real work done in the real world, the theoretical basis of your version control system matters little. If the system doesn't work for my developers - who like many projects are doing this for their fun and in their spare time - then it doesn't work, period. If I can't explain it to the boss at work, it won't get installed.

    And that's why Subversion is everywhere and arch is, where exactly?

    Now Linus is a man with his feet on the earth, so GIT may have a different fate. Wake me when Eclipse and Textmate have built-in GIT support and at least half of my potential developers know it.
  • by grumbel (592662) <> on Sunday June 03, 2007 @03:29PM (#19373357) Homepage
    ### However, in smaller projects, which really *need* a very specific direction

    Yes, but that isn't an argument why you should cripple your SCM. I absolutely agree that for a lot of projects there is little to no use for distributed repositories. However just because you don't need distributed repositories doesn't mean you can't take advantage of them, i.e. you get proper offline support and you get also a proper way to distribute changesets, which SVN still doesn't support (i.e. no "svn diff/patch" that actually track all your changes like file moves and such, not just a small subset).

    That said, I don't really like the idea of forcing everybody to download the whole repository like you do with git. A lot of users just want to compile checkout and update it every now and then, but never work on it. Git seems to be a little to much a developer-only tool. That the documentation actively discourage gits central repository support also raises some doubts on how well it would work. However, I do think that in the end distributed repositories are they way to go, not because you need them, but because its simply the better design.

The sooner you fall behind, the more time you have to catch up.