Linus on GIT and SCM 392
An anonymous reader sends us to a blog posting (with the YouTube video embedded) about Linus Torvalds' talk at Google a few weeks back. Linus talked about developing GIT, the source control system used by the Linux kernel developers, and exhibited his characteristic strong opinions on subjects around SCM, by which he means "Source Code Management." SCM is a subject that coders are either passionate about or bored by. Linus appears to be in the former camp. Here is his take on Subversion: "Subversion has been the most pointless project ever started... Subversion used to say, 'CVS done right.' With that slogan there is nowhere you can go. There is no way to do CVS right."
how to learn git? (Score:5, Informative)
anybody have a good tutorial? (not the crappy one which comes with it)
I'm not an SCM rube either. I've competently used tla (arch), darcs, and of course CVS. but git just seems too hard to use. damn fast though.
There's a difference between GIT and SVN (Score:5, Informative)
There are appropriate uses to both of these, and in kernel development I think it makes sense to have distributed development. However, in smaller projects, which really *need* a very specific direction (example, Wesnoth, I would think would not have gotten where it is today if there were so many branches where people were all making their own art).
Linus is enough of a famed leader that he's going to be listened to, and thus kind of pulls the community around him as a central source of development. That's not necessarily going to happen everywhere.
Re:Why winge? (Score:5, Informative)
Re:Linus knows it. (Score:2, Informative)
Most definitely bettern than SVN, right?
--jeffk++
Re:git (Score:1, Informative)
Re:Linus knows it. (Score:5, Informative)
Distributed version control gaining ground in FOSS (Score:5, Informative)
A common pattern in development is to try one approach, test it, tweak it, and possibly try another approach if the first did not work out, perhaps reverting to a prior approach. With decentralized version control, you can commit your changes to a local repository and work from there. All the locally changes you make are versioned, and be committed, checked out, examined all without contacting a central repository. This is ideal, because you often want to try various options to find the one that works best, before pushing your changes to the rest of the world. In centralized version control, you can use a branch for this purpose, but often branches in these systems are difficult to either create, merge, or maintain, so they are rarely used. The end result is that with centralized version control, developers version their workspace in their head. DVCS systems remove the mental burden.
Fortunately, FOSS developers are realizing the usefulness of DVCS and major projects are converting to some form of DVCS. Mozilla is switching to Mercurial [mozillazine.org]. The Pidgin [pidgin.im] project, which just released 2.0.1, is using Monotone [pidgin.im]. (Linus favorably mentioned both of these distributed version control systems in his Git talk, as they are both are distributed).
Once you accept that DVCS is better than the centralized model (which may not be true for some situations), only a few (but growing number of) version control systems are viable. This is currently a hot area in open source development, with software such as GNU Arch, Monotone, Mercurial, Git, Darcs, Bazaar, and more paving the way. Many open source DVCS's are still in development and not ready for general usage. I can't speak for Mercurial, but Monotone doesn't have the greatest performance, instead preferring integrity over speed. This led Linus to write git, since speed is very crucial for a large project like the Linux kernel.
Whatever the actual program (git, Mercuial, or Monotone), more and more open source developers are realizing the advantages that distributed version control can offer. I encourage all developers that haven't used any DVCS to try it -- once you do, you won't go back.
Re:how to learn git? (Score:5, Informative)
When you're starting out, just remember "git commit -a" and you'll be fine. Also check out "git reflog" to see the linear history of your repo. The pulling/pushing stuff can get a lot more complex but it's damn powerful. If you can figure out Arch (yeesh) you can figure out git!
SLASHDOT SEZ: you have too few characters per line. Okay, slashdot, here's part of the man page for git-rebase:
If is specified, git-rebase will perform an automatic git checkout before doing anything else. Otherwise it remains on the current branch. All changes made by commits in the current branch but that are not in are saved to a temporary area. This is the same set of commits that would be shown by git log
Re:Linus knows it. (Score:5, Informative)
In addition, git works well for simple projects but not so well for projects that have many different related subprojects which share code.
For instance, our SVN repository holds everything needed for an entire product, including embedded linux with busybox, initrd and custom software and libraries - as well as DSP source code for two different add on cards, the GUI for mac, windows, and linux, the docutils xml file for the various manuals, and manufacturing and test code.
I'd love to use git once it attains the required maturity level so that I can do what I need with it.
--jeffk++
Re:There's a difference between GIT and SVN (Score:3, Informative)
Re:Lemme check my last home appraisal... (Score:5, Informative)
In my experience, nearly all merges occur automatically and cleanly. Only if two developers modified code in conflicting areas of the source code do you have to merge manually--and even then, only one person has to do it. It is much better to have merging operate automatically and transparently when possible, than to have to have two people manually coordinate each and every one of their changes beforehand.
Re:Well, speaking from my own experience... (Score:5, Informative)
Most distributed version control systems exhibit this phenomena, because by "checking out" you are actually doing two operations: pulling the latest changes from someone else, and updating your workspace. For example, in Monotone you would type (I imagine git operates similarly):
The first command retrieves revisions from the server, and the second updates your workspace with those new changes. To "commit" a change, in a distributed version control system you first 1) commit the change to your local repository and then 2) push it to someone else:
It is often useful to keep these operations separate. For example, you can commit without pushing. Make a bunch of changes, commit each one separately, and only push once you're satisfied with the result. Other developers can still see each change you made individually, but only after you've pushed, so they won't be stuck with an incomplete in-progress version of the tree.
Similarly, by being able to update without pulling, you can revert to any revision you would like without contacting the network. Likewise, since commit does not require network access, it is no extra effort to work offline. Once an Internet connection is available, you can synchronize your repositories, but in the meantime you can make any change you want - even with no network connection.
The main disadvantage of a decentralized version control system is that it requires workflow changes [pidgin.im] to get the most out of it. If you are only familiar with centralized version control systems, it will take some time getting used to. But I'm glad to say, an increasing number of projects are making the change to distributed version control [slashdot.org], among them, Mozilla and Pidgin. They are not using Git (but Mercurial and Monotone, respectively) but they're all distributed. Git is being used by the Beryl [beryl-project.org] project, among others. Subversion has momentum in FOSS because it is familiar for those used to centralized version control (everyone knows CVS), and SourceForge [sourceforge.net] provides free SVN hosting. Once a free open source hosting site provides hosting for a distributed version control system, I expect more low-resource open source projects to use it.
Re:Merging *does* suck (Score:5, Informative)
So don't do it
Wow! I bet you have never worked on anything other than hobby
projects.
Most projects I have worked on cannot do without branching &
branching big & I am not talking about branches created for
individual devs.
What do you do if you have make patches on an earlier release(s)?
What do you do if your project team has 50 devs working on
5 different modules inside? If one guy makes a buggy submit
it will break every one else? Typically each team does weekly
sanity tests & then propagates the changes to the main.
Yeah - and I agree with Linus - CVS is rubbish.
Have used CVS, Clearcase & Source Depot. Source Depot
is a Microsoft internal Source Control system. Microsoft
licensed Perforce & developed on it. I used to work with
MS long back & Source Depot was the best Source Control
System I have ever used.
CVS lacks too many features.
1) Atomic checkins/submits
I am trying to submit changes in 5 files as a single bugfix.
A submit/checkin should either succeed for all 5 or fail for all 5.
CVS doesn't do this. The end result is that I may end up submitting
a change in the header without submitting a correspond change in the
implementation file.
2) Changelists
After checking in multiples files together, at any point in time, I should
be able to find out all the changes that were checked in at the same time.
CVS has no way of doing this - Submitting 5 files together is the same as
submitting 5 files separately as far as CVS is concerned.
3) More Changelist features for non-submitted changes
Let us say I am working on 3 different bugfixes. Source Depot allows me
group together my changes in different changelists even before I
submit the changes. That is I can create changelist A B & C.
In changelist A - I have files a.c & a1.c changed, in changelist
B, I have b.c & b1.c changed & so on. So I decide I am done with
all the changes required in the subset A, I can submit it very easily
or undo all changes in changelist B.
4) Merges
Merges between branches are a breeze with Source Depot. With CVS it's
a pain. Source Depot stores a lot of information about merges which have
already happened which in invaluable. In CVS, merges between branches
are very little more than changes manually copied from one branch to
another.
I can do a lot of stuff which I can't do with CVS
- I can very trivally merge Bugfix 1111 (comprising of 5 files
checked into changelist XXXX) from a branch to another branch or
the main trunk.
- Because Source Depot stores information about merges, I can do periodic
single command merges very easily between a branch & the trunk - Source Depot
will not try to merge in changes which have already been merged the last
time I did a merge.
I could go on & on, but the point is that something Source Depot makes
a developers life so much more easier. I could work around all these
things in CVS (i.e. do it in multiple steps) but the ease is something
worth paying for I think. If Microsoft ever released Source Depot
as a commercial product, it would be great, but I don't suppose their
license with Perforce would allow it.
Re:~$ mv CommitAccess MergePrivileges (Score:3, Informative)
> access, you'll just have a fight about who has the ability to merge into mainstream.
I really wish he would have addressed that question a little more directly, too.
I think the problem is that you're thinking about it from a classic centralized development model. I have some trouble getting my head around it, too.
Basically, from a truly distributed SCM perspective, there is no "mainstream". All branches are equal. Obviously this isn't quite the case with Linux, but bear with me here.
If you've got good code, what happens is that your changes get merged into more branches than bad code. The more popular your code, the more stuff gets built on top of it. If your code is good enough, eventually it gets into the "mainstream" simply by being an unavoidable dependency for other code.
Quality (or quantity) of code rules.
Politics are only an issue, then, if someone tries to bypass this process by skipping their changes right into the most "popular" branches. But this means they have to convince the owners of those branches to merge it. And while it's really hard to ignore changes from someone with commit access in a centralized SCM, ignoring someone in a distributed SCM is just a killfile away.
c.
Re:Distributed version control gaining ground in F (Score:3, Informative)
You can't because subversion has no client side version control.
Re:git is pretty cool, take a closer look (Score:5, Informative)
Re:Distributed version control gaining ground in F (Score:4, Informative)
For what it's worth, I use Monotone daily and find the performance acceptable. For the record, Linus used Monotone at a particularly bad time it its development cycle [mail-archive.com], when it was very slow and the main designer was on vacation. Nonetheless, the Monotone developers emphasize correctness and integrity over speed, and Mercurial and Git were direct responses to the performance of Monotone. Still, the performance of Monotone is always improving.
Re:~$ mv CommitAccess MergePrivileges (Score:3, Informative)
Everyone has a complete tree so everyone can push patches between themselves. Linus doesn't have to accept it into his own tree. That cuts down on the politics. Before everyone had they're own tar ball and push patches around but you lose history and it takes more work.
The other thing is that it's easier to delegate political questions. Lets say Linus pulls networking patches without even looking at them. The networking maintainer gets to deal with all the political issues. This is how it worked before but it was all manual and you lose all the commit comments etc.
Re:~$ mv CommitAccess MergePrivileges (Score:5, Informative)
It's not that the politics go away. It's that the policy is no longer a binary "yes or no" decision, so the technical arrangement mirrors the social arrangement. This doesn't work with CommitAccess because people wouldn't commit the same change everywhere they should, and they couldn't be restricted to only making changes they're trusted to make (there are people who are trusted to correct spelling in comments in any file in the tree, and Linus can look through the total changes they send and verify that they only change spelling in comments).
So use SVK (Score:3, Informative)
So Use SVK, which uses the base libraries of Subversion (the atomic, versioning filesystem ones which are heavily tested and work very well) and uses them to build a distributed SCM.
http://en.wikipedia.org/wiki/SVK [wikipedia.org]
Comment removed (Score:3, Informative)
Re:Source Safe (Score:2, Informative)
That's because instead of using a 'real' database, it's borrowed an ancient unix tradition: using the file system as data store. There really isn't that much difference: other source management systems put them in tables, with rows that just might be labeled 1, 2,
And it can't be the worst product ever, not even in its own category. CVS already has that honor.
And Linus may be right, but he's wrong. There's no way to make CVS better, but subversion started off as an atempt to make a versioning system that explicitly avoids CVS's known drawbacks and pitfalls, and they're doing a damn good job at it.
Actual Youtube link (Score:5, Informative)
This is the video from the article. You can either watch it in the tiny embedded window, or you can go to youtube and click the button to watch it full-screen.
Look, posters: if you're going to point to a video that's hosted on YouTube (or another video hosting site), just link to that site. Don't link to some random web page that has the video embedded in it.
Re:Why winge? (Score:3, Informative)
SVN and perforce are centralized SCMs and GIT,darcs,bzr are decentralized.
So SVN,CVS,Perforce are uninteresting to Linus because he and the linux kernel developers work in a decentralized manner. Working decentralized means that there needs to be a person to receive patches from others and apply it to his tree. Anyone with write permissions can commit to a SVN server without submitting patches to a manager. Both have their own pluses and minuses.
KDE uses svn so there doesn't need to be one person to integrate all of the patches because it's a HUGE project. I guess linux is a lot smaller so Linus can handle all of the incoming changes. However, i think he mostly rubber-stamps most of the patches coming from his trusted lieutenants.
Personally, I use perforce at work. Preforce has the best GUI tools I've ever had the pleasure to use. It's also got amazing merge tools for resolving merge problems.
At home, I use svn+bzr while working on krita. svn has ok tools but bzr is amazing for transparently working as a temporary backup until i commit to the main kde svn. I find the svn cli to be good enough.
Cheers
Ben
Comment removed (Score:3, Informative)
Re:git is pretty cool, take a closer look (Score:3, Informative)
The concept of tags is a crutch for systems where a versioned copy operation is too expensive. Since subversion keeps track of history across copies, and also doesn't copy the file content unless there's changes, there's simply no reason to have a separate concept of tags in subversion. The reason systems like CVS has tags is exactly because branching is expensive in CVS.
And a copy IS a "real" branch. To quote you further:
"One great thing about git is that so much of it is just files in the .git dir and shell scripts that combine very simple low-level functions. For instance, you can create a branch just by saving the SHA1 ID of the tip into a file in .git. You can branch off any point in the history this way, including branches you've deleted in the past (git keeps all the old commit objects by default, even ones that aren't pointed to by any branch or tag.. this is very simple and understandable model, like reference-counting in a way)."
Gee... Wonder how Subversion does it? Oh, that's right, it keeps a single revision number that uniquely define each commit, and a branch is just a copy of a subset of data at a specific revision, and you can branch of any point in the history by saving the revision number, including branches you've deleted in the past.
As for your "A-B-C" example, I just don't see the point - it's just a type of situation that's never come up for me. However, if you absolutely want to do it in subversion, you certainly can. It's not particularly hard either - all you have to do is selectively copy or merge the changes you want into your working directory.
I'm not saying Subversion is perfect, but don't blame Subversion for things that reflect your poor understanding of it rather than actual limitations.
Re:git is pretty cool, take a closer look (Score:3, Informative)
Re:Why winge? (Score:1, Informative)
So there you go, Mr. A.C. You're not quite the linguist either.
Re:Merging *does* suck (Score:3, Informative)
Re:Why winge? (Score:3, Informative)
While it is centralized, one cannot deny that its branch merging tools are about the most powerful out there.
Warner Losh
FreeBSD kernel hacker
P.S. This is an abbreviated version of a much longer post to the blog listed in this article.
Re:Source Safe (Score:3, Informative)
Re:Why winge? (Score:3, Informative)
iirc the story goes something like
linus didn't want to use version control
linus justified his not using version control by saying all the options were crap.
The bitkeeper guys were carefully watching his arguments and moulding thier tool based on them.
linus was finally backed into a corner by the bitkeeper guys and ended up using bitkeeper
bitkeeper was reverse engineered as a result of its use for the linux kernel.
there was a big fallout between the linux kernel team and the bitkeeper guys rendering its continued use for the linux kernel impractical.
linus wrote git because he could no longer live without version control but didn't think any of the availible soloutions were acceptable.