Forgot your password?
Programming IT Technology

Performance Tuning Subversion 200

Posted by ScuttleMonkey
from the geek-tweaks dept.
BlueVoodoo writes "Subversion is one of the few version control systems that can store binary files using a delta algorithm. In this article, senior developer David Bell explains why Subversion's performance suffers when handling binaries and suggests several ways to work around the problem."
This discussion has been archived. No new comments can be posted.

Performance Tuning Subversion

Comments Filter:
  • Why binaries? (Score:2, Interesting)

    by janrinok (846318) on Wednesday May 23, 2007 @03:52PM (#19243411)
    I know it can handle binaries, but I cannot think why I would want to. Can anyone help?
  • by Anonymous Coward on Wednesday May 23, 2007 @03:59PM (#19243547)
    Subversion fails to follow symbolic links that point to code that other projects share for the sake of a minority that still develops using Windows (which doesn't have real symbolic links).

    CVS ystem [] has prooven itself to be superior and far more intuitive.

    You have code that many projects share, like multi-platform-compatibility-layers? Just use symbolic links and CVS will follow them.

    In SVN you have to create a repository for these shared source files and write config files by hand to make it include these files your repository.

    I hardly see SVN reach the point of flexibility CVS has. They support Windows (which doesn't have symbolic links) and give up usability.

    Except this difference SVN and CVS are the same. There are marginal differencies in features but these affect no real world use. So if you want a version control system where you don't need to write config files by hand you choose CVS. If you want the latest hype you choose SVN.

    There wasn't really a need for SVN.
  • by hpoul (219387) <> on Wednesday May 23, 2007 @04:01PM (#19243583) Homepage
    for me performance is (currently) the least of my problems with subversion..
    more that you lose changes without any warning or whatsoever during merging .. Name=users&msgNo=65992 [] .. and noone seems to be too bothered..

    (don't get me wrong, i love subversion .. and i use it for my open source projects.. but currently CVS is way better.. just because of the tools and a few unnecessary annoyances less)
  • What about git? (Score:1, Interesting)

    by Anonymous Coward on Wednesday May 23, 2007 @04:04PM (#19243637)
    In short: Use git-svn

    Long version: The fraction of a few speedup described in the article is blown away by the several orders of magnitude you get by using git. Then there are all the other goodies, like real branches and merges, git-bisect, and visualization with gitk. Subversion is just for people who are forced to use it, or those not exploring all their options these days.
  • why import/export (Score:1, Interesting)

    by Anonymous Coward on Wednesday May 23, 2007 @04:14PM (#19243785)
    why use subversion only as import/export? That's the complaint here right? (the slow import/export speeds?) I thought the point in using revision control is to checkout then do commit/update commands???
  • by Crazy Taco (1083423) on Wednesday May 23, 2007 @04:19PM (#19243879)
    It is still the wave of the future. I've worked in it extensively, and it is still the best version control system I've ever used. Because of its other strengths, it is continuing to expand its user base and gain popularity. You can tell this because Microsoft is now actively attempting to copy Subversion's concepts and ways of doing things. Ever used Team Foundation Server? It is just like Subversion, only buggier (and without a good way to roll back a changeset... you have to download and install Team Foundation Power Tools to do it). I'm a new employee at my company (which uses Microsoft technology), and yet I've been explaining how the TFS system works to seasoned .Net architecture veterans. The reason I can do this? I worked extensively with Subversion, read the Subversion book a few times (the O'Reilly book maintained by the Subversion team), and worked on a project for my previous company that basically had the goal of making versions of the TFS wizards for Subversion on the Eclipse platform. It only took me about one day of using TFS to be able to predict how it would respond, what its quirks would be, etc, because it's technical underpinnings are just like Subversion. So even with performance issues, if even Microsoft is abandoning its years of efforts on Source Safe and jumping all over this, you can know that its strengths still make it worth adopting over the other alternatives. After all, if Microsoft was going to dump source safe, it had its pick of other systems to copy, as well as the option of trying to make something new. What did it pick? Subversion.
  • by Crazy Taco (1083423) on Wednesday May 23, 2007 @04:23PM (#19243939)
    And you can ALSO save space by version controlling ANY type of file because of its binary delta features. My software team often would place .doc files or other sorts of documentation into our projects, and CVS would save full copies of each document to version control them, chewing up space like crazy. If you work on a big software project, where you can run into things like 1000 page word specification files, you do NOT want a version control system that doesn't use binary differencing. This is another reason why SVN WILL replace CVS.
  • by Tankko (911999) on Wednesday May 23, 2007 @04:29PM (#19244025)
    I've been using Subverison for 2 years on game related projects. Most of our assets are binary (photoshop files, images, 3D models, etc), plus all the text based code. I love subversion. Best thing out there that doesn't cost $800/seat.

    What I don't like about this article is that it implies I should have to restructure my development environment to deal with a flaw in my version control. The binary issue is huge with subverison, but most of the people working on subversion don't use binary storage as much as game projects. Subversion should have an option to store the head as a full file, not a delta, and this problem would be solved. True, it would slowdown the commit time, but commits happen a lot less than updates (at least for us). Also the re-delta-ing of the head-1 revision could happen on the server in the background, keeping commits fast.
  • by Anonymous Coward on Wednesday May 23, 2007 @04:33PM (#19244081)
    You ever try to move a directory structure full of source code from one place to another in CVS -- or even to move or rename a single file...?
    HINT: When you do it the way CVS provides, you will lose all of your revision history.
    SVN does not have this fatal flaw.

    Yeah, that is a problem with CVS. Your revision history is there, you just can't trace it since a move is a delete and recreate. So if in your move/rename commit comment you say where you are moving it from, you can manually trace (though this is a huge pain).

    We have moved all our CVS repositories to SVN at work. As much as I like the revision history problem being gone, I would've pushed harder to stick with CVS (I didn't think SVN was ready at the time, and I still don't). CVS is way more stable, solid, and trouble free, and clients for it are also very stable. SVN has numerous issues that keep popping up, mostly in the clients (the working copy metadata gets corrupted all the time), but some that might even be server-side corruption (didn't quite figure out why, but everyones' working copy got corrupted in the same place).

    Are there any SVN-to-CVS conversion utilities out there for those of us who want to go back to CVS?
  • by shirai (42309) on Wednesday May 23, 2007 @04:33PM (#19244091) Homepage
    Okay, I know this is completely off-topic but I'd really like to get some responses or some discussion going on what makes version control suck.

    I mean, is it just me or is revision control software incredibly difficult to use? To put this into context, I've developed software that builds websites with integrated shopping cart, dozens of business features, email integration, domain name, integration, over 100,000 sites built with it, (blah blah blah) but I find revision control HARD.

    It feels to me like there is a fundamentally easier way to do revision control. But, I haven't found it yet or know if it exists.

    I guess for people coming from CVS, Subversion is easier. But with subversion, I just found it disgusting (and hard to manage) how it left all these invisible files all over my system and if I copied a directory, for example, there would be two copies linked to the same place in the repository. Also, some actions that I do directly to the files are very difficult to reconcile with the repository.

    Since then, I've switched our development team to Perforce (which I like much better), but we still spend too much time on version control issues. With the number, speed of rollouts and need for easy accessibility to certain types of rollbacks (but not others), we are unusual. In fact, we ended up using a layout that hasn't been documented before but works well for us. That said, I still find version control hard.

    Am I alone? Are there better solutions (open source or paid?) that you've found? I'd like to hear.
  • Re:What about git? (Score:3, Interesting)

    by koreth (409849) * on Wednesday May 23, 2007 @04:39PM (#19244229)
    Hear hear. git-svn makes Subversion tolerable. The only reason I'd ever choose native Subversion over a newer system like git or Mercurial is if I needed some tool that had builtin Subversion integration and didn't support anything else. Absent that criterion, IMO if you choose Subversion it's a sign you don't really understand version control too well.
  • Re:Why binaries? (Score:3, Interesting)

    by jfengel (409917) on Wednesday May 23, 2007 @04:40PM (#19244255) Homepage Journal
    That's certainly true. It's tolerable when I'm on the LAN with the server. When I'm working via VPN from home, I get up and watch some TV when doing a full checkout of my system. (Some of that is the binaries, though much is just the sheer number of files and the latencies caused by the SSH.)
  • by norton_I (64015) <> on Wednesday May 23, 2007 @04:58PM (#19244529)
    You are not alone, but I think the problem is intrinsic (or nearly so). VC is one more thing you have to worry about that is not actually doing your work. It is easy as long as you don't want to do anything with VC you couldn't do otherwise. If all you do is linear development of a single branch, it is pretty easy. Memorize a few commands for import, checkout, and checkin and you are fine, but all you really get is a backup system. As soon as you want to branch and merge and so forth, it becomes much more complicated.

    I think the only way to make it work really well is to have an administrator whose job it is to be a VC expert, rather than a programming expert. You need someone with some serious scripting skills and a deep understanding of the structure of the VC filesystem. With the proper scripts in place, you can really streamline the process for your specific project and enforce your coding practices, but maintaining the system is a seperate skill from programming. Also, when performing non-standard merges or whatever, you would probably need a coder to work with the admin to make sure you don't do it in a way that will hamstring you later. Of course, most projects can't afford that, and many programmers don't want to leave their code in the hands of some script monkey, or won't believe that someone else can do something as "trivial" as vc better than them :)
  • by GrievousMistake (880829) on Wednesday May 23, 2007 @05:00PM (#19244577)
    Honestly, if you think Subversion is the wave of the future, you haven't been paying much attention. It fixes some fundamental flaws in CVS, which is nice, but elsewhere there's exciting stuff like Monotone, darcs and many others. It seems people aren't looking hard enough for source control options, when they'll go wild over things like SVN, or more recently GIT.

    I suppose one has to be conservative with deployment of this stuff, you don't want to have code locked away in unmantained software, or erased by immaturity bugs, but it's still an interesting field.
  • Re:Why binaries? (Score:5, Interesting)

    by daeg (828071) on Wednesday May 23, 2007 @05:01PM (#19244585)
    Not just images in the sense of PNGs and JPGs, but original source documents as well (PSD, AI, SVG, etc). We track several large (40MB+) source files and I've seen some slowness but nothing to write home about.

    We give our outside designers access to their own SVN repository. When we contract out a design (for a brochure, for instance), I give them the SVN checkout path for the project, along with a user name and password. They don't get paid until they commit the final version along with a matching PDF proof.

    This solves several issues:

    (a) The tendency for design studios to withhold original artwork. Most of them do this to ensure you have to keep coming back to them like lost puppies needing the next bowl of food. It also eliminates the "I e-mailed it to you already!" argument, removes insecure FTP transfers, and can automatically notify interested parties upon checkin. No checkin? No pay. Period.

    (b) Printers have to check out the file themselves using svn. They have no excuse to print a wrong file, and you can have a complete log to cross-check their work. They said it's printing? Look at the checkout/export log and see if they actually downloaded the artwork and how long ago.

    (c) The lack of accountability via e-mail and phone. We use Trac in the same setup, so all artwork change requests MUST go through Trac. No detailed ticket? No change.

    (d) Keeps all files under one system that is easy to back up.

    You may have a little difficulty finding someone at both the design and print companies that can do this, but a 1 page Word document seems to do the trick just fine.
  • Re:Why binaries? (Score:3, Interesting)

    by IWannaBeAnAC (653701) on Wednesday May 23, 2007 @05:03PM (#19244641)
    What you want is a makefile that will track the dependencies in the latex documents, and generate .eps files from the figures. There are a few around on the web, but I haven't yet seen a 'does everything' version. What program do you use to generate the .eps ?
  • Re:Why binaries? (Score:4, Interesting)

    by jbreckman (917963) on Wednesday May 23, 2007 @05:18PM (#19244871)
    We use it for version control and sharing of powerpoint/audio files. It keeps things considerably saner than a shared drive.

    And yes, for a 250mb audio file, it is VERY slow.
  • by 0xABADC0DA (867955) on Wednesday May 23, 2007 @06:37PM (#19245725)
    But that's the problem with subversion... the things that one might normally do all the sudden are 'fiddling with stuff you are not supposed to fiddle with' and a big 'no-no'.

    1) You want to make a copy of trunk to send to somebody:

        tar cvf project.tar .

    With svn you have to go through a bunch of magic to do this or you end up giving them an original copy when you may have local changes (you tweaked some config option or whatever), your username, time svn repo address and structrure, etc. If you do svn export it makes a copy of what is in HEAD not in your folder, so there is no way to do this without going back and weeding out this junk

    2) You want to export something

        # svn export svn:something /tmp
        svn: '/tmp' already exists

    Really, you think?

    3) You make a copy of a file and then decide to rename it (or other cases).

        # svn cp /other/file.c file.c
        # svn mv file.c newname.c
        svn: Use --force to override
        svn: Move will not be attempted unless forced
        # svn --force mv file.c newname.c
        svn: Cannot copy: it is not in repo yet; try committing first

    Svn says you *must* do a bogus commit because you wanted to rename a file, or alternatively you can revert the new file and lose it? wtf? dumb.

    4) You want to do the same thing on lots of files

        # svn mkdir newdir
        # svn cp *.c newdir
        svn: Client error in parsing arguments

    That's right you have to break out your bash/perl script skills to do this. Lame.

    There's a *lot* to dislike about svn. It's basically just 'icky' all throughout. The checkouts are huge and ugly, many operations are slow (compared to monotone), its really annoying to have a private repo that you sync occasionally so you end up with zillions of tiny commits or losing work because you didn't commit enough. And the repo itself is very large -- converted a 2g repo from svn to monotone preserving revisions and even with straight add/del instead of renames/moves the monotone database was a small fraction of the size, about 1/6th. Incidentally, the monotone version was much faster in pretty much every way.

    Monotone is technically much better than subversion, except for one problem that you can't checkout only a subset of a repo. Maybe they have fixed that by now and if so it would be crazy to use svn instead of it IMO. I'm sure there are also many others out there better than svn.
  • by tentac1e (62936) on Wednesday May 23, 2007 @06:44PM (#19245783) Journal

    It's difficult because of the inherent complexity of the problem. Version Control is recording and syncronizing changes to an arbitrary set of files in an arbitrary file hierarchy. Everything is easy until you start messing with the layout, but that's just a matter of using "svn" instead of doing it by hand.

    A lot of people use version control without understanding it. I just took a side gig to replace an incompetent developer who spent 7 months developing a web app directly on the server. One of his last acts was to put the project under subversion. The idiot put 35 megs of logs into the repository.

    Why should you care about those invisible files in your directories? If you want a clean copy without those directories, do an "export." If you're messing with those files on a working copy, you deserve every minute of pain.

    Why would you copy a folder directly? If you want to experiment with a change, either commit your current change and experiment on your working copy, or create a branch. That sounds complicated, but the learning curve is about the same as using "history" in photoshop.

    What's so unusual about your team? My deployments and rollbacks take seconds via Capistrano [].

  • Vesta is better (Score:3, Interesting)

    by ebunga (95613) on Wednesday May 23, 2007 @07:11PM (#19246111) Homepage
    If you actually care about your code and making proper releases, use Vesta []. Transparent version control that even tracks changes between proper check-ins (real "sub" versions). Built-in build system that beats the pants off of Make. It even has dependency tracking to the point that you not only keep your code under version control, but the entire build system. That's right. You can actually go back and build release 21 with the tools used to build release 21. It's sort of like ClearCase but without all the headache. Did I mention it's open source?

    The first time I used Vesta, it was a life-changing experience. It's nice to see something that isn't a rehash of the 1960s
  • Notice.. (Score:3, Interesting)

    by sudog (101964) on Wednesday May 23, 2007 @07:44PM (#19246433) Homepage
    .. that the article is glaringly absent *actual check-in times.* Or, where *actual check-in times* are available, the details of whether it's the same file as in previous tests is glaringly absent. This leaves open the question as to whether the data set they were working on was identical or whether it was different between the various tests.

    Questions that remain:

    1. Does the algorithm simply "plainly store" previously-compressed files, and is this the reason why that is the most time-efficient?
    2. What exactly was the data for the *actual check-in* times? (What took 28m? What took 13m?)
    3. Given that speedier/efficient check-in requires a large tarball format, how are artists supposed to incorporate this into their standard workflow? (Sure, there's a script for check-in, but the article is absent any details about actually using or checking-out the files thus stored except to say it's an unresolved problem regarding browsing files so stored.)

    The amount of CPU required for binary diff calculation is pretty significant. For an artistic team that generates large volumes of binary data (much of it in the form of mpeg streams, large lossy-compressed jpeg files, and so forth) it would be interesting to find out what kind of gains a binary diff would provide, if any.

    Document storage would also be an interesting and fairer test. Isn't .ODF typically stored in compressed form? If not, then small changes wouldn't necessarily affect the entirety of the file (as it would in a gzip file if the change were at the beginning) and SVN might be able to store the data very efficiently. Uncompressed PDF would certainly benefit.
  • by XO (250276) <> on Wednesday May 23, 2007 @07:47PM (#19246455) Homepage Journal
    I need to probably seriously set up a development environment to examine this, but it seems that there are probably some pretty serious program ineffencies, if throwing a processor upgrade at the problem decreases the time 14x, as the article seemed to indicate.

    It's like when I added 2,457 files to a VLC play list. It took 55 minutes to complete the operation. I immediatly downloaded the VLC code, and went looking through it...

    It loops, while(1), through a piece of code that is commented "/* Change this, it is extremely slow */", or some such. The moment I have a C/C++ Linux development environment functioning, I am going to fix that, if it hasn't been already, as well as looking into the SVN problem.

FORTRAN is a good example of a language which is easier to parse using ad hoc techniques. -- D. Gries [What's good about it? Ed.]