Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Programming Software IT Technology

Ease Into Subversion From CVS 130

comforteagle writes "While you have a nice leisurely Sunday afternoon/evening you might want to read this fine article on easing into Subversion from CVS. Written by versioning admin Mike Mason, it talks about the philosophy and design behind Subversion (now 1.0), how it improves upon CVS, and how to get started using it."
This discussion has been archived. No new comments can be posted.

Ease Into Subversion From CVS

Comments Filter:
  • Tip (Score:3, Insightful)

    by aePrime ( 469226 ) on Monday March 08, 2004 @12:14AM (#8495112)
    Remember, when changing software components, it's a good idea to back up first!
    • Re:Tip (Score:1, Funny)

      by Anonymous Coward
      Ahh, it was a joke. Evidently not a very funny one.
    • Remember, when driving your car out of the garage, it's a good idea to open the door first!

      Remember, when you are going to take a leak, it's a good idea to pull your wang out first! (Assuming you're male, of course. Disclaimers are the second order of business around here, right after frist posts.)

      Remember, when you are going to spread peanut butter on bread, open the peanut butter jar first!

  • Is there demand? (Score:5, Interesting)

    by cookiepus ( 154655 ) on Monday March 08, 2004 @12:29AM (#8495219) Homepage
    I've read the linked article (really!) and I think Subversion sounds like a good idea. Primarily, I like the fact that everything you can do with CVS you can do with Subversion in the same way as with CVS.

    I am really curious how much demand there is for Subversion's new features, however.

    Do developers out there voice the need to store binaries? I can imagine this being needed for web developers and such, but I think programmers can just build their binaries from CVS.

    Also, have there been many problems that required atomic commits? Can someone explain why this is important? I mean, the idea is you'll need to merge one way or another. I can see the point being in that what you commit at any given time will compile (presuming you're commiting completed code) but realistically, does anyone not fix their up-to-date checks as soon as they happen?

    Also, Subversions says that it is much faster at things like tagging, but tagging is not a very frequent operation...

    To me it sounds like a great product but I am not able to see a compelling reason why most development shops out there who are currently in CVS would rush to switch.

    Not a flame btw, just an opinion.

    • Re:Is there demand? (Score:5, Informative)

      by nosferatu-man ( 13652 ) <spamdot@homonculus.net> on Monday March 08, 2004 @12:35AM (#8495265) Homepage
      We're switching. CVS is crufty, buggy, and slow. That alone is reason enough to switch, but atomic commits and faster and more transparent branching will be, in the long run, a more fundamental win.

      'jfb
    • Re:Is there demand? (Score:5, Informative)

      by aurum42 ( 712010 ) on Monday March 08, 2004 @12:40AM (#8495286)
      I don't know what your development model is, but branching and tagging are often some of the most frequent (and slowest, in CVS) operations.

      Many projects follow the "make branch, fix bug in branch, test branch and then merge" cycle, which makes a lot of sense.

      • Consider GCC (Score:5, Informative)

        by devphil ( 51341 ) on Monday March 08, 2004 @02:09AM (#8495755) Homepage


        Once a week, a snapshot release is made. That means a tag is added. This operation takes, on average, 40 minutes, because the GCC source tree is large.

        Every time someome makes a branch, they create a tag just before branching (for use later on, with diffs and merging). 40 minutes to tag, another 40 minutes to branch.

        All because these are, stupidly, O(n) operations instead of O(1). We'd like to move to Subversion, but can't, until they get annotate ('svn blame') fully working, because GCC developers spend a lot of time doing "revision-control archaeology".

        • Re:Consider GCC (Score:5, Informative)

          by nthomas ( 10354 ) on Monday March 08, 2004 @03:16AM (#8496052)
          We'd like to move to Subversion, but can't, until they get annotate ('svn blame') fully working, because GCC developers spend a lot of time doing "revision-control archaeology".

          Just curious, 'svn blame' was added 2003-10. What about it is not working for you?

          Thomas

          • Re:Consider GCC (Score:3, Interesting)

            by devphil ( 51341 )


            The person who tried it reported it wasn't working for certain branches off the main trunk. *shrug* Haven't tried it personally since the 1.0 release.

    • Re:Is there demand? (Score:5, Interesting)

      by dietz ( 553239 ) on Monday March 08, 2004 @12:45AM (#8495314)
      Before reading this, let the record show that I am a subversion fanboy. But I am only a Subversion fanboy because it solved almost all of my complaints about CVS. I am not involved with the project at all.

      Do developers out there voice the need to store binaries?

      Uh, most projects of any size will have at least a few binary files in their repository... icons, etc. But you could store those in CVS without too many problems.

      Also, have there been many problems that required atomic commits? Can someone explain why this is important?

      Rolling back changes without atomic commits is a pain in fucking ass. Have you ever had to do it? You have to track down every file that you changed (somehow... hopefully you can remember), check which version was the version prior to your commit, and get all those versions of files. For example "Okay, I need version 1.7 of foo.c and version 1.8 of barf.c and version 1.13 of foo.h." It's totally annoying.

      Plus atomic commits just makes it much, much easier to keep track of what changes have gone it. This is my biggest, biggest complaint about CVS. File-level commits just make no sense. There is no time, ever, that I can think of when the ability to commit an entire changeset at once isn't better than committing a single file at a time.

      Also, Subversions says that it is much faster at things like tagging, but tagging is not a very frequent operation...

      Depends on your development process. During beta periods, it's common to make a tag or two per day, and if each tag takes ten minutes, well... it's not a big thing, but it's certainly annoying.

      To me it sounds like a great product but I am not able to see a compelling reason why most development shops out there who are currently in CVS would rush to switch.

      Certainly not every shop is going to "rush to switch". But, regardless, I imagine that every shop will switch eventually. It may take years, but subversion's advantages are significant enough that in my opinion it will become the new version control standard.

      Also note that CVS was crufty and adding new features was almost impossible. Subversion targetted CVS features as their 1.0 milestone. But more importantly, the Subversion code base is a much better baseline to work from when adding new features. So you can expect that it will only get better in the future.
      • Re:Is there demand? (Score:3, Interesting)

        by spongman ( 182339 )
        Yeah, I love the fact that there's a revision number that's global to the whole repository.

        We embed that number into each build of our product and our testers file bugs against a particular revision. If I can't repro a bug against my current code, I can just create a new branch at the given revision, compile, and I know I'm using exactly the same code that the tester was running.

      • Re:Is there demand? (Score:3, Informative)

        by 0x0d0a ( 568518 )
        Rolling back changes without atomic commits is a pain in fucking ass. Have you ever had to do it? You have to track down every file that you changed (somehow... hopefully you can remember), check which version was the version prior to your commit, and get all those versions of files. For example "Okay, I need version 1.7 of foo.c and version 1.8 of barf.c and version 1.13 of foo.h." It's totally annoying.

        Take a look at the -D flag. You'll be pleased.

        I agree that CVS was almost mind-bogglingly crufty. I
    • Re:Is there demand? (Score:5, Informative)

      by Endive4Ever ( 742304 ) on Monday March 08, 2004 @01:02AM (#8495416)
      Do developers out there voice the need to store binaries? I can imagine this being needed for web developers and such, but I think programmers can just build their binaries from CVS.

      Yes, developers definitely need to store binaries. I worked on a project awhile back where the boot block code was a finished binary. Because CVS was used to house the project, a horrible kludge involving UUENCODE had to be used to store the binary commits. Sometimes the binary was created by a totally different tool that the main build machine doesn't have. In the case I speak of, the binary was built with an expensive licensed assembler for an Analog Devices DSP chip, and contained as a body of the 'build' because it was dynamically 'injected' into the dsp processor from the native processor, which happened to be an 80196.

      There are always cases where a binary needs to be committed. Think about bitmaps and other resources. It doesn't make sense to 'generate them from source' every time a build is done.

      Given all this, it's my understanding that with newer versions of CVS binaries can be committed safely. Is this even an instance where 'Subversion' is needed?
    • Re:Is there demand? (Score:2, Informative)

      by Anonymous Coward
      Do developers out there voice the need to store binaries?

      It's a useful feature. Many companies like to store versions of binaries alongside sources. That way, if some customer has a bug with version 2.1.2.4 of Foofware, the company can just check that out, instead of figuring out (and hoping to get it right) how to build it.

      And atomic commits are very useful. I wondere how CVS got so popular without them, but I think it is that people don't have them and didn't know what they were missing.

      Subversio

      • Re:Is there demand? (Score:4, Interesting)

        by Textbook Error ( 590676 ) on Monday March 08, 2004 @12:09PM (#8498782)
        if some customer has a bug with version 2.1.2.4 of Foofware, the company can just check that out, instead of figuring out (and hoping to get it right) how to build it

        Your build system is seriously broken if this is the case. The whole point of revision control is that you can get back to a previous build just by fetching a specific tag or branch. If that means that you need to keep your entire dev environment (IDE+tools straight off the CD, headers, runtime libraries, etc) under revision control then that's what you should do.

        Builds have to be deterministic if you want to have reliable QA, and making the build process reproducible is at least as important as using source control. The alternative is you end up checking out a build from 6 months ago that crashes, yet when you try and build the equivalent source the crash goes away. Having to say "um, this should be the same build but this one works and that one doesn't and I can't tell you why" is a sign that something pretty serious has gone wrong in your process.

        There are plenty of other good reasons to keep binary data in a revision control system (images, sound, models, data for regression tests, materials for installers, etc) but trying to avoid having to have a deterministic build process shouldn't be one of them.

        Third party libraries that you never build yourself can obviously be checked in as-is, but anything that you build from source should always be buildable from source on a brand new workspace. No ifs, no buts - if you can't produce a reliable build on demand, how do you know what's going into any of your builds?
    • Atomic commits are essential if you plan to automatically build upon every submission.

      Otherwise, when a developer changes a data structure, and submits the .C before the .H, the build will break if it decides to build after the .C was submitted.

      • Good point, I didn't know people do that.Why build upon every submit? To see if what you've got in the repository is compilable at all times?

        • Yep. It's something of a corrolary to the "many eyes == shallow bugs" theory; the more builds you do as a function of submissions, the more easily you can pinpoint where the break was submitted.

          Especially if it's not a compile error, but something found during regression testing. The build engine should email (or otherwise notify) the regression test suite machines upon a build's completion, which should then kick off the automated regression tests, emailing results to the team and managers when done.

    • Binary files (Score:5, Informative)

      by ggeens ( 53767 ) <ggeens AT iggyland DOT com> on Monday March 08, 2004 @05:26AM (#8496403) Homepage Journal

      Do developers out there voice the need to store binaries?

      There are definitely reasons for storing binary (non-text) files in a version control system:

      • Images: quite obvious. You want to version all your artwork. For web-based projects, this can be a large part of your system.
      • External libraries: if you use third-party libraries, it makes sense to store them in the version control system. If you need a particular build, you check out the correct revision. This allows you to build the exact same binary as it was delivered before. (Of course, if you have the sources to the library, you might want to import them into your project. But if you don't change the sources, that might be overkill.)
      • Compiled files: some people like to store all object files into version control. Again, this allows you to retrieve a specific version faster (no need to recompile). Personally, I would do this only if the compilation takes too much time.
      • Documentation: whether you use MS Office or OpenOffice.org, documentation will be in a binary format. (OOo uses compressed XML.)
      • Test data: you might want to version your test cases, and those will consist of binary data.
    • Re:Is there demand? (Score:4, Interesting)

      by Ninja Programmer ( 145252 ) on Monday March 08, 2004 @06:54AM (#8496680) Homepage
      Do developers out there voice the need to store binaries? I can imagine this being needed for web developers and such, but I think programmers can just build their binaries from CVS.
      CVS lets you check in binaries. But it doesn't use any diff algorithm -- its just stores each instance. So its just inefficient. Any application that uses media will commonly have binary data.

      The other thing is that Unicode source data is typically not stored in a purely ASCII compatible form. Moving forward, people are going to be using Unicode source data which at a low level can be considered essentially binary.

      Also, have there been many problems that required atomic commits? Can someone explain why this is important?
      Once you get to above about two dozen developers working on the same code base, you will end up with erroneous check-in collisions. Detecting and reversing out of these is a lot of fun.

      I mean, the idea is you'll need to merge one way or another.
      If you check-in mulitple files, then everything will be checked in except where there are conflicts. When you fix the "conflicts" you end up with an image that nobody actually tested. If you test it before checking in the fixes for the conflicts, then you leave the source tree exposed in a state where only part of your check in is there (and with enough developers there is an arbitrary number of partial checkins that the tree might be containing at any one time.)

      These are all standard "race condition" problems. Commits have to be atomic for the same reason that transactions are atomic in databases, and mutexes/semaphores exist in operating systems.

      IMHO, this issue alone is more important that all other combined.

      Also, Subversions says that it is much faster at things like tagging, but tagging is not a very frequent operation...
      Chicken and egg? If tagging were fast, wouldn't people be more likely to use it? Tagging is a way test people, release people, and even marketing people interact with the development results in a way that makes sense to them. Tagging is a very useful thing. Having numbered check-ins like Perforce makes this slightly less important, but why map your milestone ordinals to some homebrew scheme, when your source control can do it for you?
      • CVS lets you check in binaries. But it doesn't use any diff algorithm -- its just stores each instance. So its just inefficient. Any application that uses media will commonly have binary data.

        CVS stores binaries but it is not so trivial. When we put some binary data into our CVS tree we realized Windows users can't access it (need some setting in repository). CVS behaves differently in Linux and in Windows in this case.
        • Re:Is there demand? (Score:3, Interesting)

          by 0x0d0a ( 568518 )
          That's because you checked in the binary in text format instead of binary, and the linefeed translation chewed up your binaries when switching between platforms.

          This is particularly annoying with text-like formats, like Visual Studio 6's .dsw files -- they look like text files, they smell like text files, and CVS autodetects them as text files, but Visual Studio 6 throws a tantrum if you try to hand it a .dsw file with LF line endings.
    • Do developers out there voice the need to store binaries?

      Hell yes. There's often binary data in my projects, like graphics. cvs add -kb does work, but it doesn't work very well.

      I've never felt a need for the other features listed. The main draw for me is that it can actually rename files and remove directories.

      I'll use it as soon as sourceforge starts supporting it...
    • > Do developers out there voice the need to store binaries?

      Hell yes. Worldforge has a media developers group that is using CVS, and they just hate it. The admin has to periodically go through and sweep out old media file versions because they're simply too big to keep all of them.

      > Also, have there been many problems that required atomic commits? Can someone explain why this is important?

      Very simple. If I change several files, and the changes depend on each other (happens every time one changes
    • how about versioning directories?
    • "Do developers out there voice the need to store binaries? I can imagine this being needed for web developers and such, but I think programmers can just build their binaries from CVS."

      Personally, I am working on a game, and there are tons of binary format files (textures, models) While many of these files can be generated from source so to speak, it would not always be practical. In addition, they take up a non-trivial amount of space, and the binary diff feature could help with this.

    • >>Do developers out there voice the need to store binaries?

      Yup.

      My project is versioning all our documentation. So that means Word and Excel files. Right now, we're doing it in Visual Source Safe (which I hate), but at the project start there was no alternative.

      Also, on the development side, there are binary files: images, report definition files, vendor libraries... lots of stuff.

      wbs.

  • Windows server? (Score:2, Interesting)

    by Xtifr ( 1323 )
    It's nice that you can run a subversion server on MSWin server systems, I suppose, if that's the sort of thing that floats your boat. But how on earth is the option to spend hundreds of extra dollars on proprietary operating system software and the more-expensive hardware it requires "significantly lower[ing] the barrier to entry?"

    There may be a minor barrier, in Win-only shops (although I would say that it's the Win-only policy that is the barrier, not the other way around). Like I say, Win support is a
    • Cost isn't an issue.

      In a Microsoft Shop developers will use Microsoft SourceSafe. period. Subversion doesn't have a chance to compete because there is absolutely no way that it can integrate fully into the .Net development tools the way Microsoft's Own Source Storage Software is designed to do.

      And to be honest...it's not that expensive for one copy per shop.
      • Re:Windows server? (Score:1, Informative)

        by Anonymous Coward
        You can't be serious. Most serious shops with large development teams, like as over 50 programmers use other source control software. Very few mid size firms use SourceSafe, which sucks big time. Even hardcore MS people I know say it sucks big time.
      • Re:Windows server? (Score:2, Informative)

        by ogre57 ( 632144 )

        In a Microsoft Shop developers will use Microsoft SourceSafe. period.

        No, they won't. Can think of several shops/teams using PVCS, plus a handful on other products, but none using MSS. Up front (purchase) cost isn't much of an issue. Time cost (TCO) very much is. MSS is simply much too slow to be competitive.

        • Re:Windows server? (Score:3, Informative)

          by cyborch ( 524661 )
          Also, and much more importantly: MSS only does file locking - not merging file content. It can hardly be called a "real" versioning system.
          • yup.

            I worked with VSS and Visual Age. A check in went like this:

            1) lock entire VSS source tree
            2) export VSS onto your HD.
            3) export your Visual Age source ontop of the VSS source tree
            4) use another versioning tool to tell you which files you've touched.
            5) check only these files back into VSS
            6) unlock.

            A complete PITA, basically because we were using two repositories in parallel. Every so often, they would fall out of sync, or someone would get the order wrong and go backwards in time.
      • Re:Windows server? (Score:3, Informative)

        by Adrian ( 4029 )

        In a Microsoft Shop developers will use Microsoft SourceSafe. period

        Not in my experience. Some do and some don't. The absence of pain not using VSS can supply compensates for the lack of tool integration. Even MS doesn't use VSS internally ;-)

        Subversion doesn't have a chance to compete because there is absolutely no way that it can integrate fully into the .Net development tools the way Microsoft's Own Source Storage Software is designed to do.

        I think the people writing the Subway [homedns.org] and sourcecros [sourcecross.org]

        • Jalindi Igloo [jalindi.com], CVS SCC plugin for Microsoft Visual Studio and other compliant IDEs
          CVSIn [kryptonians.net], CVS Integration Add-in for Microsoft Visual Studio
          CVS SCC proxy [pushok.com] is the SCC API plug-in which provides access from practically all Microsoft SCC enabled software to the general CVS repositories.

          I don't know how well any of them work, one page [asp.net] asserts that igloo is crap and the SCC proxy is the holy grail. I have no idea if any of this is true, just googling [google.com] a litle.

      • On the contrary, for a long time replacing Sourcesafe with CVS was a major way that I managed to sneak Linux into Windows shops (before it became easy to set up CVS servers on Windows). Windows developers who made the switch loved it. Nowadays, they use Linux for all sorts of other reasons, and there's even less reason for them to stick to inferior Microsoft tools.
      • If cost isn't the issue, then what is? For MS-only shops, the issue is the MS-only policy itself, and that's just a policy. It's purely voluntary, I don't see how a voluntary policy counts as a significant barrier. A barrier, yes, but significant?

        As for using MS-branded tools only, that's irrelevant, as the ability to run subversion servers on windows doesn't do anything to that barrier. Perhaps you didn't read the article?
    • Re:Windows server? (Score:5, Insightful)

      by Eneff ( 96967 ) on Monday March 08, 2004 @01:48AM (#8495664)
      How about individuals wanting source control on their at-home projects? I'm sure not going to spend the money on the MS control, but I don't have a *ix box up 24/7 either. (I use my laptop nearly exclusively, and my laptop hardware supports Windows better.)

      • Re:Windows server? (Score:5, Informative)

        by spongman ( 182339 ) on Monday March 08, 2004 @07:11AM (#8496728)
        I'm runing svnserve on a windows box in a production environment and it works great.

        If you want to start svnserve as a windows service, google for srvany.exe, it allows you to run a regular win32 exe as a service.

        • Re:Windows server? (Score:3, Informative)

          by spongman ( 182339 )
          i should add: I'd definitely recommend installing TortoiseSVN. Having the SVN operations available as a shell extension is a godsend. For example you can use SVN from within any FileOpen dialog. The only thing it's missing is a directory-diff, but on XP you can show the SVN status of files in explorer by configuring the attribute columns in the details view.

          Also, I'd recommend downloading perforce's p4win 3-way merge tool. It's a little better than the one built into TortoiseSVN.

        • ... and if you want to use it with ssh on top of svnserve you can read this step by step guide [tigris.org]
      • Why would you want to use CVS in server mode if you're not using multiple boxes? AFAIK, CVS runs just fine in standalone mode on Win. And for "at-home" projects, that should be sufficient. There's other alternatives too, like external hosting: sourceforge and whatnot.
    • But how on earth is the option to spend hundreds of extra dollars on proprietary operating system software and the more-expensive hardware it requires "significantly lower[ing] the barrier to entry?"

      Sunk costs: people have already paid for Windows, and they have already invested in training for it. But if you can get these people to move to OSS for some things, maybe you can get them to switch for others as well.
  • by wayne606 ( 211893 ) on Monday March 08, 2004 @12:50AM (#8495358)
    It bothers me a bit that all the files are now in a big database. A good thing about CVS is that you can see what files and modules are available using regular unix tools, and if things get messed up in some way you can always fall back to the rcs commands or in the worst case edit the ,v file by hand and extract the latest version. With a database, if things were to get corrupted enough (I have no evidence that this happens often, but still...) you are stuck. Just like with the windows registry, where if it gets messed up you lose big.

    Any opinions on this?
    • well...the data in the database has to exist on disk somewhere...probably a raw encrypted file somewhere...I imagine a backup of this file(s) will allow it to be used in another version of the software...I can't really tell without more research.
    • Live backups, baby (Score:5, Insightful)

      by dFaust ( 546790 ) on Monday March 08, 2004 @01:33AM (#8495593)
      This is a valid point, one that has crossed my mind in the past. But consider how many databases are out there in the world. Many with incomprehensible amounts of data. Given this, stability is obviously a number one priority to users and developers of databases, and certainly something that was considered before the Subversion folks a) chose to use a database backend and b) chose BerkeleyDB. Subversion has been self-hosted (they used Subversion for their source control) for over a year, and have yet to lose any data. While a year isn't that long, it's a start.

      But using a database DOES provide advantages, as stated in the article. Mostly speed advantages, but also the ability to do live backups. If you try backing up an online (as in live) CVS server's files, there's nothing stopping people from doing commits, thus possibly botching your backup (you're no longer backing up the files you thought you were).

      And when it comes down to it, backups are really where your safety lies. In the last CVS project I worked on, the repository was hosed twice. Once due to a careless admin, and once due to the hard drive dying. While we had some down time, virtually no work was lost, largely due to our nightly backups. The fact that CVS stored its data as plain text files certainly didn't protect us.

      • The problem is that putting the stuff into a database creates another dependency on a non-trivial piece of software. That creates all sorts of risks.

        On UNIX/Linux systems, the file system is more than sufficient for handling this kind of storage and transactioning, so this dependency and risk is unnecessary.

        I suspect Subversion uses a database because it may be intended to run on operating systems with less powerful file systems.
        • You really need to explain the statement:

          I suspect Subversion uses a database because it may be intended to run on operating systems with less powerful file systems.

          A filesystem should not be used to hold multiple versions of a file as well as the meta-data associated with it. Less not forgeting the associations of multiple files that become a project. This is the work of a database, hence BerkleyDB. If you are concerned about "repairing" a file (aka db), there are command-line tools for just such an e

          • A filesystem should not be used to hold multiple versions of a file as well as the meta-data associated with it. Less not forgeting the associations of multiple files that become a project. This is the work of a database, hence BerkleyDB.

            The UNIX file systems is a database. That's what it is designed to be, that's what it is used as, and that's what it is good at. It has an extensive set of tools for manipulating it and lots of excellent GUIs for dealing with it. Some current UNIX/Linux file system imp
            • But the filesystem(Atleast if you use posix functions to communicate with your filesystem), still lag atomic write, transactions and rollback. Ofcause you can implement all this on top of the filesystem, but then you just end of with something similary to the DB shared library.

              • But the filesystem(Atleast if you use posix functions to communicate with your filesystem), still lag atomic write, transactions and rollback.

                UNIX file systems, of course, have atomic write, transactions, and rollback, they just aren't called that. Read some books on UNIX and look at how this sort of thing is handled in systems that do, in fact, use the UNIX file system as a database. For its particular application areas, the UNIX file system is so good that many relational databases use it to store blo
                • UNIX file systems, of course, have atomic write, transactions, and rollback, they just aren't called that.

                  Tell me, sensei, how to do a multiple file rollback on a raw Unix filesystem? Or how I can ensure transactional integrity without a transaction manager? Oh wait -- you can't. The facility doesn't exist.

                  'jfb
                  • Tell me, sensei, how to do a multiple file rollback on a raw Unix filesystem? Or how I can ensure transactional integrity without a transaction manager?

                    With "link", "fsync", lock files, and directories. Read some UNIX source code to see how to build more complex transactional guarantees on top of that.

                    And, no, I'm not your "sensei". If you want an education, pay someone or at least buy yourself a good UNIX book.

                    Oh wait -- you can't. The facility doesn't exist.

                    What's lacking is not some UNIX system c
                    • Bullshit. Your "simple" solution requires AT LEAST as much conceptual overhead and abstraction as using a proper database, and, as an added nonbonus, you're stuck with the '70s era Unix filesystem semantics and security model -- and, even better, you're at the mercy of the implementors of a system that's only tangentially related to a proper database. Anyone who'd ever built even a moderately complex application on top of a Unix filesystem would blanch at your suggestion that a combination of "lock files"
                    • I have to interject here, although using locks symlinks, and other hacks for POSIX based filesystems is probably a bad idea, ReiserFS is planning on adding meta-data and some other features that begin to edge into DB territory.

                      Although the other poster seems to have a notion of a point its not really valid with EXTn, UFS, or others.
                    • Bullshit. Your "simple" solution requires AT LEAST as much conceptual overhead and abstraction as using a proper database,

                      It may well require the same amount of "conceptual overhead", but it has much less coupling, fewer software dependencies, much less total code, and fewer system calls for each transaction.

                      as an added nonbonus, you're stuck with the '70s era Unix filesystem semantics and security model

                      Yes: you are "stuck with" a set of time-tested semantics and a proven security model. (Of course,
                    • For example, I know that the UNIX file system can easily and efficiently handle files that are gigabytes in size (because lots of people are using UNIX file systems that way), but I have less confidence that Berkeley DB can handle many records well that are that big.

                      From Sleepycat [sleepycat.com]:

                      Databases up to 256 terabytes
                      Berkeley DB uses 48 bits to address individual bytes in a database. This means that the largest theoretical Berkeley DB database is 248 bytes, or 256 terabytes, in size. Berkeley DB is in regula

                    • 4G/256T is big enough for most applications...

                      Yes, and how many applications actually use it that way? File systems are being used that way by numerous applications every day. In fact, most databases store blobs in the file system.

                      FWIW, Subversion allows different backends, however no others have been written yet.

                      Someone should write a file system backend...
            • "the Berkeley DB decision may make sense if the Subversion server is supposed to run on Windows or on MacOS."

              Subversion is meant to be portable, I used in Windows and AFAIK there's nothing to prevent it working on Mac OS X.
              I may agree that a relational database may not be the ideal database to run a versioning system but a raw filesystem seems worst to me.

      • by halfnerd ( 553515 )
        A year?

        Taken from http://subversion.tigris.org/release-history.html:

        Milestone 3 (30 August 2001): Subversion is now self-hosting.

        there's over two years between that, and their 1.0.0 release, without *any* data loss.
      • and have yet to lose any data
        How do they know? Have they checked out old data and compared it to backups?
      • And when it comes down to it, backups are really where your safety lies. In the last CVS project I worked on, the repository was hosed twice. Once due to a careless admin, and once due to the hard drive dying. While we had some down time, virtually no work was lost, largely due to our nightly backups. The fact that CVS stored its data as plain text files certainly didn't protect us.

        A non-issue on FreeBSD-5. Why? Filesystem snapshots. You just make a snapshot before you back up. Then back up the snapshot

    • by Anonymous Coward
      Not only is it in a database, it's in a Berkeley DB. Some thoughts on this:

      1) there is absolutely nothing about a version control system that requires a key/value database like berkeley DB. I think they just use it to get free locking and transactions. Strange.

      2) berekeley DB is ultra-sensitive. Ever had to deal with a locked Berk DB, when no process was running that had it locked? You have to manually break the locks. Fun. This hasn't happened to me with subversion (yet), but I expect it to be a problem.
    • by magnum3065 ( 410727 ) on Monday March 08, 2004 @02:49AM (#8495957)
      Someone else already mentioned the ability for live backups with Subversion. Another benefit of the database is built-in journaling support. BerkelyDB logs any changes before making them, so if your system crashes or something, the DB will be restored to a stable point. This is MORE reliable than what CVS offers, even with a journaling filesystem. Also I'm pretty sure that if you REALLY need to hack the DB, there are utilities that will let you do this. However, most of the scenarios that CVS admins needed to hack the ,v files for are no longer a problem in Subversion.
    • by nthomas ( 10354 ) on Monday March 08, 2004 @03:05AM (#8496020)
      It bothers me a bit that all the files are now in a big database.

      When you used PostgreSQL, MySQL, or Oracle, does it bother you that your data is in a big database? Why do you worry so much about Subversion then?

      A good thing about CVS is that you can see what files and modules are available using regular unix tools, and if things get messed up in some way you can always fall back to the rcs commands or in the worst case edit the ,v file by hand and extract the latest version.

      It is a good thing that you were able to hand-edit CVS repositories when they got corrupted -- because corrupt CVS repositories are a dime a dozen.

      I've been using Subversion since January 2002 (yes, a full two years before 1.0 came out.) and I have never, ever, ever seen a corrupt repository or heard about one on the mailing lists. When someone did claim that they thought Subversion corrupted their repositories, the Subversion devs dropped everything to make sure this wasn't the case. AFAIK, it has never happened. (Usually it was the person using multiple servers to access their repo or putting their repo on a network share (Berkeley DB doesn't work over NFS/AFS/CIFS.))

      Let me quote a Slasdot posting of mine from a couple of years ago:

      ...there is nothing that the dev team values more than the integrity of your data. Nothing. This means that once something has been comitted, it will never be lost.
      My opinion has not changed in the past two years.

      Thomas

      • Personally, all the data in Oracle, (SQL Server even) or PostgreSQL wouldn't bother me, MySQL might worry me a little, MS Jet / Access worries me a lot. BerkleyDB I'm not sure about, I know a little of its heritage on unix but would be a lot less sure on other platforms.

        A lot of people's experience with source control and DBs will be coloured by Visual Source Safe and Jet (which it uses). It is ok until it gets corrupted, and then you are hosed. Keeping everything in readable files CVS-style is a BIG pl
        • by empty ( 53267 ) on Monday March 08, 2004 @07:06PM (#8503697)
          ...It is ok until it gets corrupted, and then you are hosed. Keeping everything in readable files CVS-style is a BIG plus point once you've been in that situation...
          ...I am also wary of database-based products which are tied to one particular database...


          Subversion has a utility that might assuage your fears:
          svnadmin dump
          The dump command can do a (full or incremental) dump of your repository such that you can completely recreate its history. If you use this command for backup, you will be assured that you don't lose any data.

          As a bonus, the dump file is human readable, so there should be no fear of losing data to an inscrutable binary file.
    • With a database, if things were to get corrupted enough (I have no evidence that this happens often, but still...) you are stuck. Just like with the windows registry, where if it gets messed up you lose big.

      I worry more about disk crashes and accidental deletions. This is what backups are for ;-)

      You can also serialise everything into a fairly human readable file to with svnadmin dump and svnadmin load if you feel you need something non-binary.

      Really not a problem as far as I'm concerned.

    • > It bothers me a bit that all the files are now in a big database.

      You think a filesystem isn't a database? It bothers me more when all the files are on an ext2fs filesystem; hope that UPS has been checked recently. Perforce uses a database as well (in fact it's the same, berkeley db or some *dbm), and I've never heard of it eating a repository. Being able to change the db backend for subversion would be nice though. In fact I'd consider it pretty damn critical for any organization-wide SCM reposito
      • Perforce used to use BDB, yes, but around 18 months ago they switched to a home-grown C++ DB of some sorts. Doing so required that it perform a dump/reload cycle (just like Subversion does when a major database upgrade is required -- think "maybe 2.0 and a long way off" in Subversion's case). This is yet another Subversion-is-like-Perforce moment where I get nice fuzzy feelings about Subversion. Perforce is database backed, with plaintext dump files, repository-wide revision numbers, cheap branches, and I s
    • I'm not sure that being able to edit the ,v files by hand is an advantage of CVS. If anything, I see it as a disadvantage since: a) you're making changes "behind the system's back"; and b) it's easy to screw up.

      The face that Subversion uses a Berkeley DB file backend doesn't mean you're hosed in case of problems, especially if you've been backing your data up. You can make a live backup anytime you want - with every commit, if you're paranoid. It's also possible to dump any or all commits to a human-rea
  • Some answers (Score:5, Informative)

    by magnum3065 ( 410727 ) on Monday March 08, 2004 @01:30AM (#8495574)
    Ok, I saw some questions about why people should switch from CVS to Subversion. The article does a nice job of covering what features Subversion adds, but people still seem to wonder why these are important.

    Atomic Commits:
    As stated in the article, if something goes wrong in the middle of a CVS commit (e.g. network goes down) it can leave the commit only partially complete. This can be a problem if changes in multiple files are dependent upon each other. Say I add a function to an API, then call it in other file. If the call gets committed and the API change doesn't, now the code in CVS won't compile. With atomic commits if the connection was dropped the commit would simply roll back. Then when my network came back up I could try to commit again, but the repository would never be left in a state where it didn't compile.

    Constant Time Tagging/Branching:
    In Subversion tagging and branching are fundamentally the same, they're both executed as a "copy" command. I'm not sure what the execution time is for these operations in CVS, though I believe it's linear to the size of the repository. In Subversion this is an O(1) operation. While one of the posts commented on tagging being an infrequent operation, this may be true, but why not let it be fast anyways? However, no matter how often you do tags, constant time branching is nice. I can at any time quickly create my own branch of a project to work from. Working in my own branch means that I can keep very granular track of my changes by committing frequently, without worrying about breaking something else. Once I'm satisfied with my changes I can merge my branch with the main code.

    Storing Binaries:
    "Binaries" does not necessarilly mean compiled code. There are plenty of things that can benefit from this. Anywhere you use graphics: web programming, GUI programming, or say game or other 3D programming andy you want to store your models. Or, you can store documentation in the repository: PDFs, Word docs, spreadsheets, etc.

    Finally, the barrier to switching isn't all that high. The command line program has quite similar syntax, so switching is pretty easy, and the other interfaces such as the web viewer, TortoiseCVS, and IDE integrations generally have counterparts for Subversion.

    Well, that's all I can think of for now. I'm actually going to try to get my company to switch over to Subversion from a commercial software they were using when we start on our new product. We're using a Java applet to interface with the repository now, and it's not very nice. CVS would work, since the main thing I want is integration with Eclipse and IntelliJ Idea, but there are plugins to support this with Subversion as well. However, Subversion has nice feature CVS doesn't, so I don't see any reason to use CVS over Subversion.
  • Be sure to talk to your programmers before you pull the switch on them. Not telling them would be rather subversive...
  • by dozer ( 30790 ) on Monday March 08, 2004 @04:14AM (#8496244)
    Subversion good points:
    • Finger feel is very similar to CVS
    • Flexible directory layout & tagging
    • Extremely stable development.
    Subversion Bad Points:
    • Database & log files take up a LOT of space.
    • Quite hard to share repositories
    • No way to mark your branches (if you accidentally check out the directory containing your branches, you just got 50 gigs of 99.9% identical files...)
    • No distributed development
    • Pretty weak merging
    Arch Good Points:
    • Extremely good distributed development
    • Super easy to share repositories
    • Pretty strong merging.
    • Very stable development
    Arch Bad Points:
    • Forces you to give your projects weird names ("my-project--branch-1--1.1").
    • Forces each branch into a different top-level directory in your archive ("my-project--branch-2--1.1").
    • Doesn't feel anything like CVS.
    • Pretty slow (but they're working on it).
    • Somewhat difficult to resolve merge conflicts
    I wish I could love Arch because distributed development absolutely rules. I could tolerate its bizarre command set, but I simply won't accept arbitrary (and ugly) constraints on what I name my projects and branches.

    Verdict: I'm still using CVS. Subversion is very close to pleasing me enough to switch... I'll probably ditch CVS some time this year.

    • by natmsincome.com ( 528791 ) <adinobro@gmail.com> on Monday March 08, 2004 @05:54AM (#8496502) Homepage
      Some of your Bad points for Subvresion don't sound quite right:

      *Quite hard to share repositories

      The repositories can be read using any WebDAV complient software. If your talking about on the web the articles says you can use viewcvs as a web interface. If you want poeple to connect to the server then it should be setup by default as it's client server.

      *No distributed development

      If your talking about multiple servers like bitkeeper then I can't help you *I know nothing* but if your talking about client server then there's a misunderstanding as it's been designed to be client server.

      I may have misunderstood what you were saying but the comments were a bit vague.
      • by dozer ( 30790 ) on Monday March 08, 2004 @07:22AM (#8496751)
        Quite hard to share repositories
        The repositories can be read using any WebDAV complient software.

        Ever tried setting up a WebDAV server? That fits anybody's definition of hard. The Subversion team recognize this, so they allow you to access the repository over ssh too (thank goodness!). Problem is, everyone using ssh must log in to the same user account or the permissions get screwed up. So, yes, it's quite hard to share repositories in Subversion.

        No distributed development
        If your talking about multiple servers like bitkeeper...

        Um, yeah. OK, allow me to be slightly clearer: Subversion does not support decentralized development. Not at all. It's a major limitation.

        • Ever tried setting up a WebDAV server? That fits anybody's definition of hard.

          I strongly disagree. Setting up a Subversion repository to be accessible over the 'net was PISS EASY, even for me, a first-time user. You can use the included light-weight server (svnserve) or Apache2 if you need options like complex authentication. It's very easy to set up and very nice to look at if you enable XML output. :)

          There are howtos in the Subversion book [red-bean.com]. Happy reading.
        • by Anonymous Coward
          Problem is, everyone using ssh must log in to the same user account or the permissions get screwed up. So, yes, it's quite hard to share repositories in Subversion.

          i do believe that is wrong. using ssh for access the users need to be in the same group, and the repository directory needs to be sticky and writable to that group.

          once setup correctly there is no problems with ssh access by multiple users.

        • Problem is, everyone using ssh must log in to the same user account or the permissions get screwed up. So, yes, it's quite hard to share repositories in Subversion.

          I think you are wrong. I log in to my repo from different ssh accounts without problems. Using cvs + svnserve with multiple accounts is also possible in windows XP [tigris.org]
        • Um, yeah. OK, allow me to be slightly clearer: Subversion does not support decentralized development.

          Client-server source control systems in general were created to support decentralised development.

          I'm hacking away at my copy of the source over here, you on yours over there, and the central archive on Sourceforge keeps us consistent. You, I and the other hackers are thus not constrained (centralised) in our development, as were would be without any source control at all.

          Maybe the lack of decentralised
    • Database & log files take up a LOT of space

      This has got a lot better recently, and with the latest Berkeley DB you don't have to worry about cleaning up the log files. I find that CVS and subversion repository size are now roughly the same.

      • Quite hard to share repositories
      • No distributed development
      • Pretty weak merging

      The SVK project [elixus.org] (basically distributed repositories built on top of subversion) is addressing a lot of these issues. Seems to be coming along nicely. The merge support isn't qu

    • Subversion Bad Points:

      Database & log files take up a LOT of space.

      svnadmin comes with a command that you run on your repository called list-unused-dblogs, it will tell you what Berkeley DB log files are unused, which you can then delete. But usually people will want to just run:

      svnadmin list-unused-dblogs repository | xargs rm

      All of this is moot if you are running Berkeley DB 4.2 or greater -- it cleans unused log files automatically.

      Quite hard to share repositories

      Decentralized reposit

  • how do you migrate? (Score:2, Interesting)

    by DeadSea ( 69598 )
    I can't switch unless we can convert our repository from cvs. Are there tools for doing this?
  • by r6144 ( 544027 ) <r6k@sohCOFFEEu.com minus caffeine> on Monday March 08, 2004 @08:46AM (#8497049) Homepage Journal
    I have used Subversion in quite a few (small, mostly one-man) research projects during the last six months. Before then I used RCS/CVS. Subversion does make me somewhat more comfortable, and I have little to complain about it, which means I probably won't ever look back.

    However, IF there is no free software like Subversion, I'll rather do with CVS than using non-free stuff even if someone else pay the money for me. For example, CVS does not have atomic commits, so I use tags instead (ironic since CVS does tagging quite slowly, but still acceptable for one-man projects). Other weak points of CVS can also be worked around. It isn't pretty, but not THAT painful either. Actually, before I discovered RCS, I just did version control manually by saving a tarball after each day's work, which is tedious but still sufferable.

    Of course, for large projects, version control is much more important.

  • Graph? (Score:3, Interesting)

    by aled ( 228417 ) on Monday March 08, 2004 @10:17AM (#8497639)
    Is there any client front end for subversion that makes a graphical tree of versions, like wincvs or cervisia? It's a very useful feature and I would like to have something equivalent for subversion.
  • Any GUI Clients? (Score:2, Interesting)

    by tjmsquared ( 702422 )
    Are there any GUI clients like wincvs for subversion yet? It looks like a much better tool, but I don't see my group switching unless there is a client that is at least as good as wincvs.
    • You want RapidSVN (Score:3, Informative)

      by Valdrax ( 32670 )
      That's a pretty good question in my opinion, and TortoiseSVN's Windows shell-extension doesn't cut it. ("-1, Redundant" my ass.) If you're looking for something more like WinCVS, check out RapidSVN. [tigris.org]
  • Meta data and Moves (Score:2, Interesting)

    by irontiki ( 607290 )
    I've been using CVS in professional development environments for about 5 years at several different employers. I love CVS but have been watching Subversion closely and with some anticipation.

    The atomic commits will be nice but honestly the lack of them has never been a huge problem for my teams (atomic commits are probably less a problem with 6-8 people). The things that do bug me about CVS that Subversion is supposed to address :

    1. the ability to move or rename a file w/o losing the history

    2. the abil
  • Someone want to forward this to the guys at SF? I'd like to know what became of their, "we'll add subversion once it matures enough" claim.
    • Someone want to forward this to the guys at SF? I'd like to know what became of their, "we'll add subversion once it matures enough" claim.

      Sourceforge seems to have been having some serious growing pains. I'd hope they fix their current problems first before adding more things that could break.

Say "twenty-three-skiddoo" to logout.

Working...