Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Perl Programming IT Technology

Perl Migrates To the Git Version Control System 277

On Elpeleg writes "The Perl Foundation has announced they are switching their version control systems to git. According to the announcement, Perl 5 migration to git would allow the language development team to take advantage of git's extensive offline and distributed version support. Git is open source and readily available to all Perl developers. Among other advantages, the announcement notes that git simplifies commits, producing fewer administrative overheads for integrating contributions. Git's change analysis tools are also singled out for praise. The transformation from Perforce to git apparently took over a year. Sam Vilain of Catalyst IT 'spent more than a year building custom tools to transform 21 years of Perl history into the first ever unified repository of every single change to Perl.' The git repository incorporates historic snapshot releases and patch sets, which is frankly both cool and historically pleasing. Some of the patch sets were apparently recovered from old hard drives, notching up the geek satisfaction factor even more. Developers can download a copy of the current Perl 5 repository directly from the perl.org site, where the source is hosted."
This discussion has been archived. No new comments can be posted.

Perl Migrates To the Git Version Control System

Comments Filter:
  • by SanityInAnarchy ( 655584 ) <ninja@slaphack.com> on Sunday January 04, 2009 @01:23PM (#26320991) Journal

    There are significant advantages of Git over Subversion. RTFS for some.

    Just to add insult to injury -- often, a Git checkout, which includes all history, takes up less space than a Subversion checkout for the same project, which doesn't even include recent commit log messages.

    But think about this -- you're saying they should use a big, slow, central server, as a single point of failure, crippling offline development, complicating branches (especially merges), and several orders of magnitude slower for just about every operation, just so you don't have to learn a "weird" tool?

  • by abdulla ( 523920 ) on Sunday January 04, 2009 @02:32PM (#26321511)

    There are also advantages to Subversion that Linus states himself [1]. Really the only one of note is that Git isn't so great at having multiple projects in the one repository and the recommendation is to have one per repository and have a super-project that contains pointers to others - which isn't so great a solution.

    [1] It was stated in relation to the layout of the KDE repository: http://git.or.cz/gitwiki/LinusTalk200705Transcript [git.or.cz]

  • by Trepidity ( 597 ) <[gro.hsikcah] [ta] [todhsals-muiriled]> on Sunday January 04, 2009 @02:37PM (#26321555)

    These distributed models work best if it's a large team, which potentially has more than one level of hierarchical structure.

    You do typically have a canonical central repository managed by the project lead (in the Linux kernel's case, Linus's tree). But then sub-section leads might have their own canonical repository for that sub-section, and merge in their team members' changes into a stable state that they approve of before asking for those changes to get merged into the central branch. Or they might bundle up some particularly important set of changes for early merging "upstream", making sure they cleanly apply against the current central repo. That's all a nightmare to manage in SVN, which conceives of branches as something you do occasionally and keep around for a while, not as a hierarchical project-management tool.

    On the other hand, if you have a relatively small or flat team, or one where the sub-sections break down really cleanly so each one can have its own central repo, it might not buy you much. I'm working on a small project with 4 people at the moment, and SVN is perfectly fine, and I can't really imagine what I'd do with a distributed version control system (I'd just use it like a centralized one, pushing everything to the one repo everyone pulls from).

  • by fafne ( 840092 ) on Sunday January 04, 2009 @02:52PM (#26321635) Homepage Journal
    DVCS does not mean anti-centralized. DVCS does not introduce arguments between developers, rather ameliorating them as it's easier to try things out and becoming more knowledgeable before discussing issues. It's about how to define the build and release systems. Obviously, you need a 'head revision' or 'release branch' or whatever you want to name the code that's defined as the one version that makes up the product. Having input from different places makes no difference on the release part of the process. Developers move the changes to the release/central build version just like they would with the old model. Almost all resistance I've seen so far is something similar to 'I don't like this because I have to learn something new' obfuscated behind a bunch of misconceptions.
  • by jbolden ( 176878 ) on Sunday January 04, 2009 @03:00PM (#26321707) Homepage

    If you just want Perl6 you can use it today with Pugs. That is the "release often" and it finished.

    If you want a real release candidate the problem is Parrot. Perl6 attempted some very complex stuff with the runtime, that so far has been challenging to implement. There is no guarantee these problems will get solved.

  • by thanasakis ( 225405 ) on Sunday January 04, 2009 @03:01PM (#26321717)

    You can use Perl 6 right now if you want. It is available, just not rated for production yet. I can understand that probably not many folks will want to use it for real world purposes, but this is pretty far from at least my definition of vapor.

  • by Anonymous Coward on Sunday January 04, 2009 @03:07PM (#26321763)

    If the original perl git packing was done aggressively (which it definitely has been done), then doing an untuned repack can only make it worse.

    IOW, the original perl git pack (that you got by cloning a well-packed repository) was probably done with a much larger search window for good delta candidates, and quite likely with a deeper delta chain than the default too. When you did your own aggressive repack, you undid all of that.

    End result: normal developers should likely never use "git gc --aggressive". It's for the maintainer, who can also tune the various knobs, and who probably has a beefy machine that can afford to go the extra mile of using lots of CPU time to find an "optimal" pack.

  • by grumbel ( 592662 ) <grumbel+slashdot@gmail.com> on Sunday January 04, 2009 @03:08PM (#26321771) Homepage

    takes up less space than a Subversion checkout for the same project

    Only if you actually *want* the whole project. If on the other side you just want a single file or subdirectory, say in a large gigabyte size repo with graphics, textures and stuff, you have kind of a problem with git. A git clone always downloads the whole thing, svn on the other side allows you do download just what you want, so the download with SVN can easily become a few orders of magnitude smaller then with git.

    git could really use a way to do shallow clones so that only those pieces get downloaded that are actually needed, git-clone --depth is a start, but not quite enough.

  • by snaz555 ( 903274 ) on Sunday January 04, 2009 @03:11PM (#26321789)

    isn't centralization the heart of source code management

    Not necessarily. Consider a common case:
    - Project A works on a significant feature (say a new file system)
    - As part of their work, they significantly restructure some related part (say how a fs ties into the kernel) and update other parts of the source to match
    - Project B works on a different feature (say overhauling the interrupt thread implementation on SMP systems)
    - Project B wants to update the same related parts (say how kernel modules, including file systems, tie into the kernel)
    - Project A is already done, and is scheduled to get on the train before project B, so it's natural for B to integrate portions of project A and then track project A's fixes and updates to these portions up to when A integrates into the trunk

    This situation is handled very cleanly by distributed systems like git and teamworks. As B selectively merges parts of A it picks up the change log. With svn when you diffpatch across from the A branch to the B branch you lose changelogs, there will be no record that these changes came from A but they'll appear independently in B. When A integrates to trunk if B simply tracks these it will appear as if B edited trunk. This is an inaccurate history. With p4 I get a headache just thinking about it.

  • by snaz555 ( 903274 ) on Sunday January 04, 2009 @03:13PM (#26321803)

    s/is already done/is already done with the shared part/g

  • by eggnet ( 75425 ) on Sunday January 04, 2009 @03:20PM (#26321843)

    The joke is that git depends on perl.

  • by n dot l ( 1099033 ) on Sunday January 04, 2009 @03:25PM (#26321869)

    We use it at work and it works much better than SVN did.

    Apart from everybody's local copies, we keep a repository sitting on a central server. That repo's "master" branch is our release code and, since I'm responsible for the final product, I'm responsible for this branch. Our workflow is fairly simple:

    1. Developer pulls down a copy of the master branch (this either creates a local copy or brings an existing copy up to date).
    2. Developer hacks away, creating, deleting, and merging local branches as is convenient for them.
    3. Developer finishes task.
    4. Developer pulls down an update, bringing their local master in sync with the central master.
    5. Developer git-rebases their code on the new master. What this does is it takes all of the changes they made since their code diverged from the master and applies them to the new master. Git will apply commits one at a time, pausing if it runs into non-trivial merges or anything else that needs to be dealt with by hand. This has proven to be a massive improvement over the old SVN approach of having the updates in trunk blindly dumped on top of your work as the conflicts tend to be smaller, clearer, and much more manageable. Not to mention that the developer who wrote (and understands) the code is doing the merge.
    6. Developer tests their code.
    7. If the code is bad, goto step 2. Otherwise the dev will collapse their many little "work in progress" commits into a single "feature implemented/bug fixed" commit.
    8. Developer pushes their cleaned up commit as a new branch on the central server and alerts me to its presence.
    9. I review the diff (practically a nop for trusted senior coders, for the rest, well, I'd be reviewing their stuff anyway).
    10. If I don't like it I send it back, else I merge it onto the central master (guaranteed to be a trivial merge since they did the work of rebasing onto the latest code - Git calls these a "fast-forward" and I automatically reject anything that hasn't been properly rebased) and delete their branch from the central server.
    11. Developer pulls down new master, deletes temporary local branches, rebases any other work in progress (or puts this step off, up to them, I don't give a damn as long as I get high quality patches in the end).
    12. goto 1

    Note that pushing to master doesn't break anybody else, ever, until they decide they're ready to deal with integrating their patch. Nobody ever does the, "Are you gonna commit first or should I?" thing anymore. Developers that are collaborating on a patch sync via a branch on the central server, or directly to each other's machines, or via emailed patches, whatever they want to do. Git doesn't care and neither do I.

    It sounds like a lot of tedious work, but Git is just stupid-fast. In the common case the whole update master, rebase, cleanup commits, push cycle takes about as long as SVN used to take to update and then scan for changes and actually commit anything. In the uncommon case where there's a non-trivial merge, the merges tend to come out a lot cleaner since Git is trying to make your changes to the new master one commit at a time, rather than dumping all of the changes in master on top of your stuff (though it can also do that, if you happen to enjoy pain).

    And while I prefer the manual approval approach (which scales by appointing trusted lieutenants to take over some of the work) since it keeps me in the loop and keeps everyone else honest, there's no reason you couldn't automate it. Some projects give everyone push access, but disallow anything but fast-forward (trivial nothing-to-merge) pushes to the central server, others I've heard of have people push to a staging branch and a bot on the server grabs the code, runs the test suite, and merges it if it's good. Access is ssh-based, and there are hooks all over the place, you can set up all sorts of schemes when it comes to control of the canonical central repo.

    The thing we've found is that because we've all

  • by n dot l ( 1099033 ) on Sunday January 04, 2009 @03:31PM (#26321935)

    I can't really imagine what I'd do with a distributed version control system

    Like I kept repeating in my other post [slashdot.org], you can actually use branches and commits as tools to aid your own work without affecting others or polluting the commit logs with junk "Work in progress" commits. That and you get sane merges. Both of those are huge, even if your team is just you and your imaginary friend.

  • by Lord Bitman ( 95493 ) on Sunday January 04, 2009 @03:59PM (#26322147)

    git makes branching and merging easy enough that the question of "where is the central line?" isn't really an issue- developers can easily work on their own branches without worrying about other branches, and you can still push your developer branch to the central repository so that the question of "Where is this change? Is it in Steve's branch? Do I need to connect to his repository?" is also not an issue- Steve's branch can easily be in the central repository, Steve just needs to push changes in, just like he'd normally need to commit changes. Git's primary difference there is that "Steve's repository" is pretty much just a robust staging area for changes.

    However, if you're used to centralized version control, you may miss things switching to git:
      - Pick whether you want all or nothing in advance. You can either have "shallow" checkouts, which leave you with a crippled, broken, and useless copy that has no access to history functions, or you can have every change ever made. Once you've made this choice, you can only change your mind by cloning again. There is no way to gracefully get history as it is required.
      - This means: no partial checkouts. This is a problem if you're used to versioning large binary files, or have large files which you won't care about for anything other than auditing reasons after a certain time.
      - Which also implies: no "modules". This is a problem if you have lots of small related projects, which together make up one massive pool of code. You can have one massive project which everyone uses all of, or you can choose not to track the origin of files which you copy from one project to another. Having a "common" project shared by several others is not possible.
      - Unless you try the "submodule" support, which is a broken hack that can devour changes far too easily to trust it to end-users. And submodule support does NOT allow copies from one "submodule" to another, or to your main project. Not while retaining history, anyway.

    This is really all one flaw, re-stated five times. Fix this and git will be able to replace any centralized system. Without the change, I can't recommend it to anyone who is involved in a centralized project- at least not when there is a reason for being centralized.

    Git is, despite proponent's claims, great for small projects which don't actually need to talk to anyone else and don't need to interface with any other projects. If your project involves other "projects" where the line between one and another is the least bit blurry, avoid git.

  • Re:Darcs vs. Git (Score:4, Informative)

    by SanityInAnarchy ( 655584 ) <ninja@slaphack.com> on Sunday January 04, 2009 @04:14PM (#26322263) Journal

    The main things I like about git are its raw speed, its ubiquity (everyone and their dog has a git tutorial), and how simple its primitives are.

    That is: I actually started with bzr, and I found that while there were some things that were much easier (bzr uncommit comes to mind), it's a lot easier to actually understand Git under the hood, in case I need to do some deep surgery on some history, say.

    Then, too, there just seem to be more tools available -- gitk, git-cherry-pick, even git-stash, are things I don't remember from bzr or hg, but it's been awhile.

    I see the point about Python, and I'm absolutely with you there. The reason it's not an issue is, I can't ever remember having to dig into Git source -- the most I might have to do is write a shell-script frontend to some of the tools already there. It's actually somewhat UNIX-y that way -- when was the last time you had to dig into the source of fileutils?

    What's more, there are language bindings -- personally, I've used grit in Ruby. Easier than trying to talk to SVN with its XML output.

    The main advantage of bzr, by the way, is its handling of directories -- it actually records renames. Git tries to detect them, but it works at the level of file contents, not directory structure -- so, for example, it'll detect if you move a function from one file to another, but it might have trouble if you rename a directory. For example:

    mv a b
    mkdir a

    And, in another branch:

    touch a/c

    When they merge, where should c go? Git would probably put it in a, since the file's name is 'a/c'. Bzr would probably put it in b, since a was renamed to b, unless the same rename somehow made it into that other branch.

    There is another reason I didn't mention -- I use Ruby. I believe Ruby itself is done on SVN, but it seems that every Ruby project ever has moved from SVN to Git, and most of them to Github. And it's just awesome to work with Github projects.

  • by lordSaurontheGreat ( 898628 ) on Sunday January 04, 2009 @04:16PM (#26322277) Homepage

    You forget that Q also appeared in ST:V a few times.

    2009: The Year Of The Truly Helpful Slashdot Grammar Nazi Watchmen

  • by Anonymous Coward on Sunday January 04, 2009 @04:29PM (#26322381)

    Merges were simply broken in many SCM's before git came along.

    Here's the problem. You start out with with the code in state A, and then develop and then commit to get state B, and then more to get state C: i.e. in graph form A->B->C

    Now let's say that when the code was in state B, you decided to branch and work on something else, leaving the code in another state D. This means the ancestry diagram would look like:

    1A->B->C
    1...\->D
    (Sorry - need to use a fixed sized font)

    Now, say that you want to the features represented by states C and D, and form a new state, E.

    1A->B->C->E
    1...\->D--^

    The earlier SCM's worked fine up until this stage. What they get wrong is the next step. Say you develop on more on each of D and E to form the diagram:

    1A->B->C->E->F
    1...\->D--^->G

    Where G's ancestor is D, and F's is E. Now, lets merge again. What we want to do is combine the changes introduced by G into F to make a new state H.

    1A->B->C->E->F->H
    1...\->D--^->G--^

    Unfortunately, the problem with earlier SCM's is that in creating state H they try to merge both D and G into F. They forget that D was already merged into E, and try to merge it in again. This obviously causes problems; either conflicts; or even worse, mysterious code duplication or deletion.

    To work around this, many coders made scripts that recorded the previous merge point through automated tags or other magic. Unfortunately, the diagram I showed above is a relatively simple case. The history graph can be much more complicated, and simple scripts will break.

    Git solves all of this by knowing the topology of the ancestry diagram. It knows exactly what has been previously merged, and what hasn't. This means when multiple branches are merging between themselves it "just works".

    All of this is completely obvious in hind sight. The reason why the bad-merge situation carried on for so long is that people though that complex project history was "bad". Now it is realized that project history complexity naturally comes from collaboration, and is perfectly normal for large projects.

  • Re:Tortoise ? (Score:5, Informative)

    by Tanaka ( 37812 ) on Sunday January 04, 2009 @04:33PM (#26322409) Homepage

    http://code.google.com/p/tortoisegit/

  • by n dot l ( 1099033 ) on Sunday January 04, 2009 @04:43PM (#26322501)

    Imagine doing this (half-asleep, bear with incorrect/silly git commands):


    git pull master
    git branch mystuff

    Two weeks go by, many, many changes have been made in master, and you do a ton of work in mystuff:


    git checkout master
    git pull master
    git checkout mystuff
    git merge master //whatever the git command to forcefully set master to point at mystuff is, I'm barely awake here...

    Or alternately:


    git checkout master
    git pull master
    git merge mystuff

    Oh, and in both cases, the merge turns into a single commit without so much as a pointer to the history. You can put that in the commit message yourself, if you like, that's your only option.

    That's essentially all you can do with SVN. You don't have the ability to rebase or cherry-pick or otherwise fiddle with your commit history so as to get a clean straight-forward merge in SVN, and because of that merges are slow and usually painful. So while you could regularly merge to keep things sync'd and simple, nobody actually does that in practice. It leads to what many SVN based teams call "merge day" where it literally takes a day to merge in a feature branch and work out all of the conflicts.

    The other issue is that branches have to be made on the server and then checked out separately, which makes them expensive. First you're polluting the global history, and second you have to do whatever build environment setup you do once you checkout your branch (you can "switch" your branch over which works in place, but that gets flakey as the things you're switching between diverge - maybe this has been fixed). So for any small-to-mid sized task you're (in practice) going to avoid creating a branch. You'll work right in trunk (master in SVN parlance). You won't make commits as you go, as those go straight to the server where they can never be erased, so your only "oops, undo that" feature you have is the undo buffer in your text editor, or picking and choosing lines from a diff, or sheer memory. And of course you *can't* make that one commit until you've pulled down all the changes that happened since your last update (yes, while your changes are sitting uncommitted in your working files), and those changes get dumped into your working copy as changes, indistinguishable from the code you just typed in diffs (except for conflicts, those get marked in the usual manner)...

    All of those problems go away when you can easily merge, because then branches cease to be painful - but then I've found that the best merges Git makes are the ones you get from rebaseing or cherry-picking, which SVN cannot do.

  • Re:Git momentum (Score:5, Informative)

    by petermgreen ( 876956 ) <plugwash@nOSpam.p10link.net> on Sunday January 04, 2009 @04:51PM (#26322569) Homepage

    Roughly

    Linus was very resistant to version control at all and could always find a reason (or excuse) not to use each version control system that came along.

    Eventually someone decided to listen to every demand from linus and create a vcs that met all of them. The catch was it was not FOSS and the gratis version had some pretty obnoxious terms. Things reached a head after someone at OSDL reverse engineered the protocol and linus was basically forced to either scrap bitkeeper or quit his job at OSDL.

    However the period with bitkeeper had convinced linus that version control was a good idea. But all the alternatives he could find were either too centralised or too slow. So he hacked together git.

  • by n dot l ( 1099033 ) on Sunday January 04, 2009 @05:02PM (#26322665)

    - This means: no partial checkouts. This is a problem if you're used to versioning large binary files, or have large files which you won't care about for anything other than auditing reasons after a certain time.

    Yeah, I agree with this. Git blazes along with the largest of code trees and keeps its history data nice and compact, but it's not so great for binaries. We use git for code and SVN for all our binary data, which is optimal for us as the binaries are unmergeable anyway and we've got workflows in place to deal with that issue.

    Having a "common" project shared by several others is not possible.

    I'm not sure what the scenario you're thinking of is, submodules have solved this for us.

    - Unless you try the "submodule" support, which is a broken hack that can devour changes far too easily to trust it to end-users. And submodule support does NOT allow copies from one "submodule" to another, or to your main project. Not while retaining history, anyway.

    Can you elaborate on this? We use submodules at work and we've never had any problem with them (short of the odd time someone forgets to git submodule update, but the lines in the git status output usually clue people in). What do you mean copying from a submodule to another? You mean like copying between repositories in general, which SVN externals couldn't do any better, or something more specific I've simply not run into?

  • Re:Git momentum (Score:3, Informative)

    by Karellen ( 104380 ) on Sunday January 04, 2009 @05:53PM (#26323087) Homepage

    Where "reverse engineered" should be translated as "telnetted to the bitkeeper server/port and typed 'help'" [lwn.net]

  • by Lord Bitman ( 95493 ) on Sunday January 04, 2009 @07:32PM (#26323963)

    You can only split a subdirectory out into its own project if it's not related. If changes ever cross the boundary, you want history. If you just want to point someone to a single place for a checkout, you want the versioning system to have some notion of that.

    Meanwhile: Burning a DVD is possibly slower or faster depending on the situation. It's a pretty stupid option either way, though. In this day, any time you're in a situation where using physical media is a better option, you've got a broken protocol. If I don't /want/ 12GB of history for images I'm not using (even if someone else might), I should not have that data at all.

    I really think it would be much better and make a lot of people happier if git would just allow a shallow checkout which asked the server for more if needed (and that server can do the same from its origin if it doesn't have it either). Give people the option of setting up servers to do history processing for people who only want a shallow copy for 99% of the work they're doing. And if I want to check out "all history up to version 2.0, but nothing earlier than that", that would likely satisfy absolutely everyone for every day-to-day work anyone ever actually does.

    Pulling entire histories is _always_ asking for trouble down the line. Subversion has the fatal flaw of keeping track of everything forever, even if no one wants it anymore (google "svn obliterate" for discussions). git solves this problem by simply not caring about the ramifications, but if the primary repository deems something necessary, EVERYONE gets it. At least with svn the problem is server-side-only.

  • by jepaton ( 662235 ) on Sunday January 04, 2009 @08:05PM (#26324269)

    You shouldn't put in the things you can't diff.

    So all the binary data etc. that is required to build an application should be managed seperately? Our GUI code is generated by a third party tool which stores its information (e.g. fonts) as part of a binary database. This belongs with the source code because the code needs to be modified in step with it. Having the log message is very helpful as it would take hours to work out the changes made between two of these files, because diff isn't useful. The files are 15MB in size.

    SVN may not be the best choice for binary data but at least it is possible to put binary data into it. I would rather endure SVN's slowness than have to manually manage binary files. I believe that revision control could be better supported by operating systems. Two copies of every file are managed by a SVN checkout - the base file and the working file. If the filesystem could store these together then the cost would be halved (if the working file referenced the base file until the working file needs to be changed). The SVN tools would then be able to work much faster because the need for file comparison would be less common.

    Unless the revision control system's performance is dreadful I think that all files should be in revision control.

  • by chromatic ( 9471 ) on Sunday January 04, 2009 @08:06PM (#26324283) Homepage

    At least with Perforce it was equally easy to get Perforce up and running on Linux or Win32 boxes. Typically you'd just have to copy one executable "p4" or if you prefer the GUI then additionally "p4v".

    To my knowledge, almost no one besides committers used Perforce to check out the Perl 5 source code. The documentation suggested using an rsync mirror. That's what I did.

    I can't help feeling that switching to "currently-Win32-neglected" git could possibly harm one of Perl's most attractive qualities; that many modules work cross platform.

    I don't see how. This has very little to do with Perl modules, nothing to do with the CPAN, and everything to do with how people who hack on Perl 5 itself get its source code. (Very few of those use Windows, and they seem confident that Git on Windows is sufficiently usable.)

  • by n dot l ( 1099033 ) on Sunday January 04, 2009 @08:31PM (#26324457)

    When was the last time you used SVN? Everything you just said is very confusing, 1.5 came out over a year ago and seems to have most of the features you say it does not.

    Chances are we never upgraded the server at work, or if we did it was after 2+ years of fearing SVN merges so I never noticed the new features. I've heard others that work with SVN say unflattering things about the automatic merge tracking but I have nothing but their word on that. Maybe they're wrong. I'd given up on SVN's merges long before 1.5 came out, and Git's other features are too compelling to warrant looking back.

    As for rebasing, this is the first I've heard of it. It sounds interesting, but I don't really understand the problem it was meant to solve.
    [...]
    Just seems a little superficial to me :\ A single patch is too much, the actual VCS history is too much, so the ideal is offering a doctored up change history?
    Wouldn't the end result be more interesting? I think good source comments with other documentation would be easier to understand and more proper than reviewing the change history of someone's patch.
    Since when was how someone's patch was developed more important than what was developed? Isn't diving into the VCS to find reasons for something being just a sign that the code wasn't documented properly? So, yes, we dive into VCS history to solve preexisting documentation problems, but when it comes to accepting someone else's patch, isn't that the time to simply demand good comments and or documentation?

    The second bit I bolded answers the question you raised in the first. You would clean up your commit history precisely because the contents are more important than the method. That meaningless history is only going to get in people's way if they have to go through the history, so why not replace it with something nice before sending your changes into the shared repository?

    While you develop you want to use branches and commits however you see fit, making throw-away instances of each when you experiment with things, reverting old changes, etc (and if you don't see why that's a big deal, you're missing out). But when you submit you don't want people to see a bunch of "work in progress" commits, you want them to see something that reflects a logical breakdown of what you did so that each component can be considered/used on its own. Most of the history rewriting commands are for this. Rebase is a special case of cherry-picking that grabs your work and reapplies it to the current development head (usually, it can be whatever branch you like) so you get a nice clean patch relative to the current code, rather than to revision whatever-it-was-when-I-branched.

    Say I write a complex patch that changes a bunch of existing code and then adds a bunch of new stuff on top of that. If the person who wrote the original code that I changed wants to review my work, it would be nice if he could just look up Changeset X: Refactored system X to add hooks for Y rather than digging through 25 tiny commits, mostly labeled "work, work" (or WIP or "undid crap" or whatever) which correspond more closely to my coffee breaks or the times I had to switch to something else for a second than it does to the code - or worse yet one giant commit that includes my changes to his stuff and a ton of stuff he doesn't care about since it's separate.

    Cleaning up the history allows me to collapse many dozens of little WIP commits into "Refactoring system X", "Added module Y", and "Optimized module Z to use new hooks exposed in X". Once I do that the maintainers of X and Z immediately know what they need to look at to find the changes that impact them, without having to dig through Y's changeset for any buried changes of interest. That also makes it much easier to cherry-pick individual components of my work into other branches, or perhaps nuke module Y while keeping the new X and Z code. Rewriting history is a powerful ope

  • Of course (Score:1, Informative)

    by sam_vilain ( 33533 ) <sam AT vilain DOT net> on Sunday January 04, 2009 @09:07PM (#26324683) Homepage

    You guys know that, In the UK at least, git is a moderate insult with the same meaning as bastard?

    Linus likes to name his projects after himself. Hence, "Linux" and "git" - the British slang meaning of the term being the one implied. I like the Oxford definition; "an arrogant or contemptible person". "bastard" has another meaning, though its slang use approximates the same :-).

  • by sam_vilain ( 33533 ) <sam AT vilain DOT net> on Sunday January 04, 2009 @09:17PM (#26324745) Homepage
    Heh, you obviously didn't find any of the actual code used for the pre-historic import [catalyst.net.nz], the hostile import [catalyst.net.nz], the Raw Perforce Importer [utsl.gen.nz] or the scarier SQL queries [utsl.gen.nz] used to manipulate the data. Your program is far easier to understand :-).
  • by toad3k ( 882007 ) on Sunday January 04, 2009 @09:36PM (#26324887)

    Trying to explain why git is better than svn is like trying to explain why svn is better than cvs. To someone who has never used it, they simply can't imagine anything better. I've actually been in the position of advocating svn over cvs and been shot down with arguments much the same as you are making now (that cvs has almost everything svn has).

    There are a myriad of git commands that do all sorts of useful stuff. mergetool, cherry, rebase -i (interactive), add -i (interactive hunk based add), gitk (for visualization), colored diffs, a history aware grep, bisect (for narrowing down a patch that broke a feature), and dozens of others.

    To directly address your comment, changing history may sound like a convenience, but it makes things a lot easier. Some person's patch breaks the website? You may find that the entire change is self contained in one or a small series of contiguous patches, making it super simple to revert. In subversion I've had to track down and revert 15 separate disparate commits over a series of weeks to bring a piece of software back into line. Another benefit is that as you are working, you can add your current work without committing and diff that version against any new changes you make. This makes it extremely easy to develop because you can keep track of just what you've done in say the last 15 minutes, as opposed to the entire day. It is great to be able to go back and review what you've done and tidy it up a bit.

    Everything on the client side that svn does, git does better. I want to list more actual use cases, but this post keeps getting too long. So Instead I'm going to encourage you to experiment with git svn at work, or on one of your personal projects. You'll mess up a couple times at first, but the productivity you gain after a few days will be well worth it. I've converted a couple people at work, and they seem happy. Personally my productivity has doubled or even tripled, although I cannot guarantee that will be the case for everyone.

  • by BuGless ( 31232 ) on Monday January 05, 2009 @06:42AM (#26327987) Homepage

    There are several reasons why, even in smaller groups, using git is advantageous (even if you have only yourself, no other contributors). I'm not going to name them all, but in my experience (I've used RCS, CVS, SVN and now git), some of the more compelling advantages are that you can:

    - Actually permanently erase/fix bad commits from the repository without a painful full dump/tricky edit/restore cycle on the repository. I suppose everyone has some of those occasional moments sometime: "Aaargh, I meant to commit only this one file, not this tar.gz file that happened to be in the wrong place at the right time." Git allows you to correct the mistake without bloat in the repository.
    - Patch management (instead of keeping around a bunch of patch files, simply create branches for every patch file you'd normally keep) made easy and trackable.
    - And related to patch management: commit early, commit often, then cleanup/merge commits before actually committing them "for real" to the bleeding edge version.

    For small groups it means that you simply setup a central git repository everyone pushes to. You get all the benefits of DVCS and classic central management, i.e. it allows you to have your cake and eat it too.

UNIX is hot. It's more than hot. It's steaming. It's quicksilver lightning with a laserbeam kicker. -- Michael Jay Tucker

Working...