Forgot your password?
typodupeerror
Programming IT Technology

Performance Tuning Subversion 200

Posted by ScuttleMonkey
from the geek-tweaks dept.
BlueVoodoo writes "Subversion is one of the few version control systems that can store binary files using a delta algorithm. In this article, senior developer David Bell explains why Subversion's performance suffers when handling binaries and suggests several ways to work around the problem."
This discussion has been archived. No new comments can be posted.

Performance Tuning Subversion

Comments Filter:
  • Why binaries? (Score:2, Interesting)

    by janrinok (846318)
    I know it can handle binaries, but I cannot think why I would want to. Can anyone help?
    • Re:Why binaries? (Score:4, Informative)

      by autocracy (192714) <slashdot2007.storyinmemo@com> on Wednesday May 23, 2007 @03:57PM (#19243491) Homepage
      First answer: Images. Many other possible answers... :)
      • Re:Why binaries? (Score:5, Interesting)

        by daeg (828071) on Wednesday May 23, 2007 @05:01PM (#19244585)
        Not just images in the sense of PNGs and JPGs, but original source documents as well (PSD, AI, SVG, etc). We track several large (40MB+) source files and I've seen some slowness but nothing to write home about.

        We give our outside designers access to their own SVN repository. When we contract out a design (for a brochure, for instance), I give them the SVN checkout path for the project, along with a user name and password. They don't get paid until they commit the final version along with a matching PDF proof.

        This solves several issues:

        (a) The tendency for design studios to withhold original artwork. Most of them do this to ensure you have to keep coming back to them like lost puppies needing the next bowl of food. It also eliminates the "I e-mailed it to you already!" argument, removes insecure FTP transfers, and can automatically notify interested parties upon checkin. No checkin? No pay. Period.

        (b) Printers have to check out the file themselves using svn. They have no excuse to print a wrong file, and you can have a complete log to cross-check their work. They said it's printing? Look at the checkout/export log and see if they actually downloaded the artwork and how long ago.

        (c) The lack of accountability via e-mail and phone. We use Trac in the same setup, so all artwork change requests MUST go through Trac. No detailed ticket? No change.

        (d) Keeps all files under one system that is easy to back up.

        You may have a little difficulty finding someone at both the design and print companies that can do this, but a 1 page Word document seems to do the trick just fine.
        • by XO (250276)
          checkout/export log? I have searched for something like that, and have found no such option. Also told on #svn, that svnserve doesn't log accesses. How do you set that up?
          • by aled (228417)
            Subversion has hooks where it call scripts, like precommit, postcommit, etc. You can implement what you want in those scripts. Some are contributed in subversion site.
            • by daeg (828071)
              You also have Apache access logs. Unfortunately, when you log the HTTP REPORT commands, it logs everything, e.g., a REPORT is checkouts, updates, AND commits all in one. Unfortunately, there isn't a SVN Hook for "pre-checkout" and "post-checkout" yet. However, there is for "post-commit", so when you combine them, the HTTP REPORT items from Apache are the browse/checkout items.

              You can also set up your own authentication mechanism through Apache. We use Django, for instance, which then logs that at least they
    • Re: (Score:2, Insightful)

      by eikonoklastes (530797)
      Tracking images/graphics while developing a web site?
    • Re:Why binaries? (Score:4, Informative)

      by teknopurge (199509) on Wednesday May 23, 2007 @03:59PM (#19243537) Homepage
      release management - you can store _compiled_ application bundles, ready-to-go.
    • Re:Why binaries? (Score:5, Informative)

      by Anonymous Coward on Wednesday May 23, 2007 @04:00PM (#19243557)
      putting a toolchain under CM control, so that you can go back to not only an earlier version of your own code, but the version of the toolchain you used to compile the code at that point in time. Absolutely necessary to be able to recreate the full software environment of a past build, without relying on that version of the toolchain still being publicly available (not to mention including any patches/mods you made to the public toolchain).
      • by iangoldby (552781) on Wednesday May 23, 2007 @05:34PM (#19245071) Homepage
        If you put the toolchain into CM, do you also put the operating system in? Just as the sourcecode is no good if you don't have the right toolchain to build it, the toolchain is no good if you don't have the right OS to run it.

        I suspect the answer (if you really need it) is to save a 'Virtual PC' image of the machine that does the build each time you make an important baseline (or each time the build machine configuration changes). Since the image is likely to be in the GB size range, you might want to store it on a DVD rather than in your CM system.
        • Don't forget, the VM is useless without the hypervisor/player/whatever, so you need to check that in too. Of course, that's generally useless without the OS, so check that in too. Even if you have an OS/hypervisor, that's useless without the hardware, so you need to check that in too.

          Or, rather than trying to figure out how to version control hardware, you could write portable code and use open standards, and not worry about all mess.

        • by NuShrike (561140)
          The OS is irrelevant when the machines being deployed to are the same OS and flavor. Windows, Linux, etc.

          Eventually, the OS updates but the tool chain updates with it. Release Management is about handling the NOW and not the whatif. Especially since stable and mature OSs don't really change that much, and aren't a toolchain dependency (unlike you RedHat people).

          You have some other granular, fault tolerant, and centralized release tracking model that works better and doesn't rely on different directories,
    • Re:Why binaries? (Score:5, Insightful)

      by jfengel (409917) on Wednesday May 23, 2007 @04:04PM (#19243615) Homepage Journal
      It's really nice to be able to have your entire product in one place and under version control. Third party DLLs (or .so's or jars), images, your documentation... just about anything that's part of your product.

      That way it's all in one place and easily backed up. If you get a new version of the DLL/jar/so you can just drop it into a new branch for testing. If your customer won't upgrade from version 2.2 to version 3.0, you can recreate the entire product to help fix bugs in the old version rather than just saying, "We've lost it, you've got to upgrade."

      Basically, by putting your entire project under version control, you know that it's all in one place, no matter what version it is you want. Even if the files don't change, you know how to reconstruct a development installation without having to dig around in multiple locations (source in version control, DLLs in one directory on the server, etc.)

      Yeah, so it costs some extra disk to store it. Disk is cheap.
    • Re:Why binaries? (Score:5, Insightful)

      by javaxman (705658) on Wednesday May 23, 2007 @04:11PM (#19243743) Journal
      1) you want deployment without the need to build
      2) you have proprietary build tools limited to developer use, or release engineers unable to build for whatever reason ( similar to #1, I know... )
      3) images, of course.
      4) Word, Excel, other proprietary document formats are all binary.
      5) third-party binary installation packages, patches, dynamic libs, tools, etc.

      You're just not trying, or you're thinking of version control as something that only programmers would use, and that they'd only use it to store their text source. There are as many reasons to store binary files in version control as there are reasons to have binary files...
    • by norton_I (64015)
      I use subversion to track latex documents, which have figures in them. I usually store both the original source file (often a binary) as well as the .eps version of figures (text, but might as well be binary) in svn, since I can't regenerate them from a script.

      I don't understand why the author of the article wants to do what he is, but lots of people have good (or good enough) reasons for wanting to track binary files.

      I always hope I don't have to keep binaries in svn, but since so many people seem to love
      • Re: (Score:3, Interesting)

        by IWannaBeAnAC (653701)
        What you want is a makefile that will track the dependencies in the latex documents, and generate .eps files from the figures. There are a few around on the web, but I haven't yet seen a 'does everything' version. What program do you use to generate the .eps ?
        • by norton_I (64015)
          Unfortunately, for drawings I usually use Illustrator or Canvas on Windows. I also generate figures in Matlab, which can be automated, but is a major pain to do so. Ideally, I would switch to Inkscape for the drwaing, but last time I looked at it (quite some time ago) it was not ready. I hear it is much better now, but I am not going to learn a new program halfway through writing my thesis.

          Thanks
        • For most figures, that's almost exactly what I do. Plots are generated from data using GNUplot. Diagrams are generated by the Makefile calling an AppleScript to tell OmniOutliner to export the latest version as a PDF. Some images, however, can't be generated in this way. Photographs are one example, as are rendered images from an overnight ray-tracing job. These go into SVN, and can be checked out when I want to look at an old version.
    • Re:Why binaries? (Score:4, Interesting)

      by jbreckman (917963) on Wednesday May 23, 2007 @05:18PM (#19244871)
      We use it for version control and sharing of powerpoint/audio files. It keeps things considerably saner than a shared drive.

      And yes, for a 250mb audio file, it is VERY slow.
    • by Bassman59 (519820)

      I know it can handle binaries, but I cannot think why I would want to. Can anyone help?

      I use Subversion for my FPGA sources. When I go to release a version of a design, I include the binary build results (.bit files from the place and route tools, and the .mcs file used to program the config EPROM) in my release tag. This is so checksums and such match exactly, and this is important because the production people put stickers on the EPROMs that display the version, part number and checksum. While I can certainly rebuild from the source, if the toolchain changes then the resulting bit file

  • by hpoul (219387) <herbert.slashdot@filter.poul.at> on Wednesday May 23, 2007 @04:01PM (#19243583) Homepage
    for me performance is (currently) the least of my problems with subversion..
    more that you lose changes without any warning or whatsoever during merging .. http://subversion.tigris.org/servlets/ReadMsg?list Name=users&msgNo=65992 [tigris.org] .. and noone seems to be too bothered..

    (don't get me wrong, i love subversion .. and i use it for my open source projects.. but currently CVS is way better.. just because of the tools and a few unnecessary annoyances less)
    • Yes, because a bug report that's 9 days old is indicative of a deep flaw in the developer structure. You should have at least said you were the one who filed it in the interests of full disclosure. Anyway, it's safe practice to check in the trunk modifications before you merge.
      • by eli173 (125690) on Wednesday May 23, 2007 @04:25PM (#19243965)

        Anyway, it's safe practice to check in the trunk modifications before you merge.

        I think you missed his point... he'd committed all his changes. The problem is that if you merge a file or directory deletion in, where that file or directory had modifications committed, Subversion won't tell you about the conflict, but will delete the file or directory including the new modifications.

        You wanted to delete it, so who cares, right?

        Subversion represents renames as a copy & delete. So now, you rename a file or directory, and do the same dance as above, and the renamed file or directory does not have changes that were made on trunk under their previous names. So renaming a file can re-introduce a bug you already fixed.

        No big deal, the devs will fix it soon, right? Wrong [tigris.org] and wrong again [tigris.org].

        That is the problem.

        • Yes, but you've not actually lost any data, you can pull the deleted files out of the repository. So at worst it would reintroduce a bug you would be able to find and fix later - but who merges without checking it worked?
          • by eli173 (125690)

            Yes, but you've not actually lost any data, you can pull the deleted files out of the repository. So at worst it would reintroduce a bug you would be able to find and fix later - but who merges without checking it worked?

            Reintroducing a bug is a very bad thing. And if you've only worked on projects with 100% test coverage, and automated execution of said tests, you're going to be in for a real rude awakening when you get a job.

            Um... sorry, let me set this flamethrower down here, turn it off, and I'll just

            • by pohl (872) *
              But if you're working on a project with 100% test coverage, you can afford to revert, can't you? It's the case where you have 0% test coverage that reverting is most dangerous, and on that end of the spectrum it really is your fault anyway.
          • Re: (Score:3, Informative)

            by LionMage (318500)

            So at worst it would reintroduce a bug you would be able to find and fix later - but who merges without checking it worked?

            What if the merges are done by someone who isn't familiar with all the code changes and the expected associated application behaviors? What if there are dozens or even hundreds of code changes in a branch being merged to trunk? What if your QA work is being done by people who are not developers and who have no involvement in the merge process?

            These are not just hypothetical issues. I

  • by frodo from middle ea (602941) on Wednesday May 23, 2007 @04:08PM (#19243681) Homepage
    My solution, use svn+ssh and keep a ssh connection to the svn server in Master mode. All svn+ssh activity tunnels through this master connection , no need for ssh handshake each time or for that matter no need to even open a socket each time.

    Plus if the master connection is set to compress data ( -C ) , then you get transparent compression.

    Now if only I could expand all this to fit 2 pages....Profit!!!

  • by bartman (9863)
    A really great way to optimize your SCM is to upgrade to git.
  • by Crazy Taco (1083423) on Wednesday May 23, 2007 @04:19PM (#19243879)
    It is still the wave of the future. I've worked in it extensively, and it is still the best version control system I've ever used. Because of its other strengths, it is continuing to expand its user base and gain popularity. You can tell this because Microsoft is now actively attempting to copy Subversion's concepts and ways of doing things. Ever used Team Foundation Server? It is just like Subversion, only buggier (and without a good way to roll back a changeset... you have to download and install Team Foundation Power Tools to do it). I'm a new employee at my company (which uses Microsoft technology), and yet I've been explaining how the TFS system works to seasoned .Net architecture veterans. The reason I can do this? I worked extensively with Subversion, read the Subversion book a few times (the O'Reilly book maintained by the Subversion team), and worked on a project for my previous company that basically had the goal of making versions of the TFS wizards for Subversion on the Eclipse platform. It only took me about one day of using TFS to be able to predict how it would respond, what its quirks would be, etc, because it's technical underpinnings are just like Subversion. So even with performance issues, if even Microsoft is abandoning its years of efforts on Source Safe and jumping all over this, you can know that its strengths still make it worth adopting over the other alternatives. After all, if Microsoft was going to dump source safe, it had its pick of other systems to copy, as well as the option of trying to make something new. What did it pick? Subversion.
    • by GrievousMistake (880829) on Wednesday May 23, 2007 @05:00PM (#19244577)
      Honestly, if you think Subversion is the wave of the future, you haven't been paying much attention. It fixes some fundamental flaws in CVS, which is nice, but elsewhere there's exciting stuff like Monotone, darcs and many others. It seems people aren't looking hard enough for source control options, when they'll go wild over things like SVN, or more recently GIT.

      I suppose one has to be conservative with deployment of this stuff, you don't want to have code locked away in unmantained software, or erased by immaturity bugs, but it's still an interesting field.
      • Honestly, if you think Subversion is the wave of the future, you haven't been paying much attention. It fixes some fundamental flaws in CVS, which is nice, but elsewhere there's exciting stuff like Monotone, darcs and many others. It seems people aren't looking hard enough for source control options, when they'll go wild over things like SVN, or more recently GIT.

        It may or may not be the wave of the future, but after looking at version control systems for almost 2 years before switching to SVN last year,
    • by XO (250276)
      I am pretty frustrated with Subversion, and all I do is manage a few pieces of source code with it. It's always farking my things up, it seems. Well, at least once a month or two, I'm mucking around with the internals of the repository to fix crap that svn did.

      Also, SVN doesn't -have- a way to rollback.
      • by XO (250276)
        oh, and about 100mb of binaries, also .. but it never messes up the binaries, it just takes forever to update/commit/checkout on those
      • That's pretty surprising that you see performance issues. We run a few tens of thousand files in one of our repositories (a 10-20GB working copy, but only 3GB in the repository). We find performance in SVN to be quite good. Lots and lots of binaries (and SVN is a darned sight better then VSS/SourceOffSite at storing those).

        All told, our total repository space is 20GB spread over about 2 dozen repositories (the 3GB one is the largest and where most work occurs).

        It's working exceeding well for us. We
  • by Tankko (911999) on Wednesday May 23, 2007 @04:29PM (#19244025)
    I've been using Subverison for 2 years on game related projects. Most of our assets are binary (photoshop files, images, 3D models, etc), plus all the text based code. I love subversion. Best thing out there that doesn't cost $800/seat.

    What I don't like about this article is that it implies I should have to restructure my development environment to deal with a flaw in my version control. The binary issue is huge with subverison, but most of the people working on subversion don't use binary storage as much as game projects. Subversion should have an option to store the head as a full file, not a delta, and this problem would be solved. True, it would slowdown the commit time, but commits happen a lot less than updates (at least for us). Also the re-delta-ing of the head-1 revision could happen on the server in the background, keeping commits fast.
    • Re: (Score:3, Interesting)

      by XO (250276)
      I need to probably seriously set up a development environment to examine this, but it seems that there are probably some pretty serious program ineffencies, if throwing a processor upgrade at the problem decreases the time 14x, as the article seemed to indicate.

      It's like when I added 2,457 files to a VLC play list. It took 55 minutes to complete the operation. I immediatly downloaded the VLC code, and went looking through it...

      It loops, while(1), through a piece of code that is commented "/* Change this, i
      • by mibus (26291)

        I need to probably seriously set up a development environment to examine this, but it seems that there are probably some pretty serious program ineffencies, if throwing a processor upgrade at the problem decreases the time 14x, as the article seemed to indicate.

        I wish it were so simple. They moved from a dual 500MHz, 500MB RAM machine, shared amongst tasks, to a 3.2GHz 2GB RAM machine solely doing SVN. That's no small upgrade, and isn't at all telling which of the three main variables (CPU, RAM, shared-or-n

        • by XO (250276)
          I still can't believe that the -server- is having problems with processing like that, though. Since SVN stores each changeset as a seperate file, all it should have to do is send out the changeset. INstead, the server sits there doing -something- for 50% of the time, then spends the other 50% of the time sending it.
  • by shirai (42309) on Wednesday May 23, 2007 @04:33PM (#19244091) Homepage
    Okay, I know this is completely off-topic but I'd really like to get some responses or some discussion going on what makes version control suck.

    I mean, is it just me or is revision control software incredibly difficult to use? To put this into context, I've developed software that builds websites with integrated shopping cart, dozens of business features, email integration, domain name, integration, over 100,000 sites built with it, (blah blah blah) but I find revision control HARD.

    It feels to me like there is a fundamentally easier way to do revision control. But, I haven't found it yet or know if it exists.

    I guess for people coming from CVS, Subversion is easier. But with subversion, I just found it disgusting (and hard to manage) how it left all these invisible files all over my system and if I copied a directory, for example, there would be two copies linked to the same place in the repository. Also, some actions that I do directly to the files are very difficult to reconcile with the repository.

    Since then, I've switched our development team to Perforce (which I like much better), but we still spend too much time on version control issues. With the number, speed of rollouts and need for easy accessibility to certain types of rollbacks (but not others), we are unusual. In fact, we ended up using a layout that hasn't been documented before but works well for us. That said, I still find version control hard.

    Am I alone? Are there better solutions (open source or paid?) that you've found? I'd like to hear.
    • by Cee (22717) on Wednesday May 23, 2007 @04:54PM (#19244451)
      Yes, version control is more difficult than not using any tool at all, but that goes for most stuff in life. There are certainly areas where usability can be improved.

      Fiddling with stuff you are not supposed to fiddle with is generally a no-no when using source control. I found though that I got used to the Subversion way to do things (learned that the hard way). For example Subversion on the client side does not really handle server side rollbacks of the complete repository since the files are cached and hashed locally. One way to make source control more transparent to the user could be to let the filesystem handle it.
      • Re: (Score:3, Interesting)

        by 0xABADC0DA (867955)
        But that's the problem with subversion... the things that one might normally do all the sudden are 'fiddling with stuff you are not supposed to fiddle with' and a big 'no-no'.

        1) You want to make a copy of trunk to send to somebody:

        tar cvf project.tar .

        With svn you have to go through a bunch of magic to do this or you end up giving them an original copy when you may have local changes (you tweaked some config option or whatever), your username, time svn repo address and structrure, etc. If yo
        • by chthon (580889)

          1) You want to make a copy of trunk to send to somebody:

          tar cvf project.tar .

          With svn you have to go through a bunch of magic to do this or you end up giving them an original copy when you may have local changes (you tweaked some config option or whatever), your username, time svn repo address and structrure, etc. If you do svn export it makes a copy of what is in HEAD not in your folder, so there is no way to do this without going back and weeding out this junk

          In any VC/CM system, you should not exp

    • Re: (Score:3, Interesting)

      by norton_I (64015)
      You are not alone, but I think the problem is intrinsic (or nearly so). VC is one more thing you have to worry about that is not actually doing your work. It is easy as long as you don't want to do anything with VC you couldn't do otherwise. If all you do is linear development of a single branch, it is pretty easy. Memorize a few commands for import, checkout, and checkin and you are fine, but all you really get is a backup system. As soon as you want to branch and merge and so forth, it becomes much m
      • Re: (Score:3, Insightful)

        by jgrahn (181062)

        You are not alone, but I think the problem is intrinsic (or nearly so). VC is one more thing you have to worry about that is not actually doing your work.

        If it isn't about doing your work, then why do you do it?

        Of course it is about doing your job. If you're a programmer, it's analogous to asking your C compiler not to suppress warnings. You would have to find those bugs anyway, and you would do a much worse job without the help.

        In my work, version control (or whatever fancy name ending in "management"

        • by norton_I (64015)

          If it isn't about doing your work, then why do you do it?

          I thought what I meant was clear. I never intended to claim that VC was useless, non-productive, or non-work. In fact, part of my point was that it is work, which many people don't understand. By "not doing your work" I meant simply that your final product is a program, not a tree of revisions. Time spent on VC does not directly result in satisfying customer needs, rather it makes it easier to create reliable software more quickly, and with less r

    • In it's simplest form... just keeping a history of changes, it really isn't that bad.

      where it becomes complicated is when you start talking about branching, merging, or trying to deal with dependencies across projects, etc.

      But if done well, version control helps more than hurts.
    • Re: (Score:3, Interesting)

      by tentac1e (62936)

      It's difficult because of the inherent complexity of the problem. Version Control is recording and syncronizing changes to an arbitrary set of files in an arbitrary file hierarchy. Everything is easy until you start messing with the layout, but that's just a matter of using "svn" instead of doing it by hand.

      A lot of people use version control without understanding it. I just took a side gig to replace an incompetent developer who spent 7 months developing a web app directly on the server. One of his last

      • The idiot put 35 megs of logs into the repository.

        SVN actually works pretty well at storing log files (it's very efficient at it - both storing and sending the changes). Especially for distribution of said log files and secure storage of the log files. Because SVN doesn't support purge, it makes a good WORM-style solution for logs. And with svn+ssh and restricting the commands that can be run with a particular SSH key make it fairly secure from tampering.

        Yeah, it's probably overkill. But it's a leg
    • But with subversion, I just found it disgusting (and hard to manage) how it left all these invisible files all over my system

      It only puts them in your working copy. Most development practices include the assumption that you wouldn't deploy your working copy simply by copying it directly. There are several models of how to generate something to be deployed. One of the most common ones is to have a script or build tool that operates on the working copy and generates something that can be deployed. That

    • Although Subversion does a great job of being a better CVS than CVS, yes, it is hard to use. Let me clarify: It is easy to use for a small project with just a few developers. But for large projects with many developers scattered all over, it, or any centralized revision control system becomes a nightmare (to me, anyway). The biggest problem I have with Subversion/CVS-type systems is that eventually managing the branches becomes a nightmare, and it becomes really easy to screw stuff up.

      My work became a lo
    • You don't like sex?
    • by chthon (580889)

      Version control is part of the software development process.

      If you are building a simple program on your own, then the basic thing to do with it is versioning in a straight line.

      However, if your program architecture becomes more complex and/or more people are working on it, then version control becomes synchronisation system.

      When you have a more elaborate development process, version control is tied in with change control and tracking.

      So, yes, version control is hard.

      I am a fulltime VC administrator fo

    • But with subversion, I just found it disgusting (and hard to manage) how it left all these invisible files all over my system and if I copied a directory, for example, there would be two copies linked to the same place in the repository. Also, some actions that I do directly to the files are very difficult to reconcile with the repository.

      That's actually a very strong advantage of SVN. Working copies do not have to map 1:1 with repositories. But it was a big change compared to how Visual SourceSafe wor
  • by weinerofthemonth (1027672) on Wednesday May 23, 2007 @04:40PM (#19244247)
    Based on the headline, I was expecting some great method for tuning Subversion for increased performance. This article was about performance tuning your processing, not Subversion.
  • by javaxman (705658) on Wednesday May 23, 2007 @05:35PM (#19245089) Journal
    At least in a general case, I couldn't expect the developers I work with to gzip their binaries before checking them into version control.

    Doing so means you have to unzip them to use them. Not very handy. Most users want to use Subversion the way they should be able to use version control- a checkout should give you all of the files you need to work with on a given project, with minimal need to move/install pieces after checkout. Implementing the 'best' suggested workaround would mean needing a script or other way to get the binaries unpacked. Programmers are often annoyed enough by the extra step of *using* version control, now you have to zip any binaries you commit to the repository?

    I'm unimpressed by their performance testing methodology... they give shared server and desktop performance numbers, but have no idea what 'else' those machines were doing? Pointless. I'd like more details regarding what they're doing in their testing. Their tests were done with a "directory tree of binary files", but don't say what size or how many files?

    My tests on our server show a 28MB binary checkout ( LAN, SPARC server, Pentium M client ) takes ~20 seconds. Export takes ~2sec. That must be a big set of files to cause a 9 minute *export*... several gigs, am I wrong? It'd be nice for them to say. Most of us, even in a worst case, won't have more than a few hundred MB in a single project.

    The only *real* solution will be a Subversion configuration option which lets you say "please, use all my disk space, speed is all I care about when it comes to binary files". CollabNet is focused enough on getting big-business support contracts that it shouldn't be long before we see this issue addressed in one manner or another. You -know- they're reading this article!

    • by XO (250276)
      My "server" is my desktop machine, AMD Athlon XP 2000+. 30mb files take around 12 minutes or so to checkout. That's to a computer on the LAN. Add 8 or 9 of those to my project, and we're spending hours doing checkouts. Fortunatly most of those files never change, but when they do, that's an automatic 10-15 minutes of time that's going to be spent waiting by every person that gets an update after that.
      • by chthon (580889)

        I think this is the case for all VC systems.

        I once did a test between Continuus and Subversion for checking out the same tree. The result where the same. Why ? Not because of the version control system, but because of the speed of bringing updates over the network to the disk. Creating files and directories is expensive.

        In my automated builds (using Continuus), reconfigures (updates) take from 5 to 20 minutes, depending upon the size of the tree and the amount of changes done. With big changes, add 5 to 1

      • My "server" is my desktop machine, AMD Athlon XP 2000+. 30mb files take around 12 minutes or so to checkout.

        You *really* need to examine your setup to find out why that is happening.

        A 300 MB checkout (maybe a few dozen files) should only take about a minute to prep and then however long it takes to move over the wire. Since we're using svn+ssh, things are extremely efficient (SSH pub-keys restricted to running the svnserver tool in tunnel mode combined with PuTTY and TortoiseSVN). Our largest reposi
  • Vesta is better (Score:3, Interesting)

    by ebunga (95613) on Wednesday May 23, 2007 @07:11PM (#19246111) Homepage
    If you actually care about your code and making proper releases, use Vesta [vestasys.org]. Transparent version control that even tracks changes between proper check-ins (real "sub" versions). Built-in build system that beats the pants off of Make. It even has dependency tracking to the point that you not only keep your code under version control, but the entire build system. That's right. You can actually go back and build release 21 with the tools used to build release 21. It's sort of like ClearCase but without all the headache. Did I mention it's open source?

    The first time I used Vesta, it was a life-changing experience. It's nice to see something that isn't a rehash of the 1960s
  • Notice.. (Score:3, Interesting)

    by sudog (101964) on Wednesday May 23, 2007 @07:44PM (#19246433) Homepage
    .. that the article is glaringly absent *actual check-in times.* Or, where *actual check-in times* are available, the details of whether it's the same file as in previous tests is glaringly absent. This leaves open the question as to whether the data set they were working on was identical or whether it was different between the various tests.

    Questions that remain:

    1. Does the algorithm simply "plainly store" previously-compressed files, and is this the reason why that is the most time-efficient?
    2. What exactly was the data for the *actual check-in* times? (What took 28m? What took 13m?)
    3. Given that speedier/efficient check-in requires a large tarball format, how are artists supposed to incorporate this into their standard workflow? (Sure, there's a script for check-in, but the article is absent any details about actually using or checking-out the files thus stored except to say it's an unresolved problem regarding browsing files so stored.)

    The amount of CPU required for binary diff calculation is pretty significant. For an artistic team that generates large volumes of binary data (much of it in the form of mpeg streams, large lossy-compressed jpeg files, and so forth) it would be interesting to find out what kind of gains a binary diff would provide, if any.

    Document storage would also be an interesting and fairer test. Isn't .ODF typically stored in compressed form? If not, then small changes wouldn't necessarily affect the entirety of the file (as it would in a gzip file if the change were at the beginning) and SVN might be able to store the data very efficiently. Uncompressed PDF would certainly benefit.
  • Neither the article nor the replies tell me anything useful. .tar.gz files are small, meaning they are fast to move through a network, but do they diff well? Good compression algorithms turn data into statistically random streams of bits, so I suspect that different generations of uncompressed .tar files would have smaller deltas than the compressed versions. Similar questions abound for GIF and JPEG files.
    • Back when we used VSS+SourceOffSite (all traffic went through SOS, which kept our repository from getting damaged), we would zip up our .MDB (MSAccess database files) prior to checking them in. VSS/SOS was extremely inefficient at storing binaries (every time you checked in a 20MB binary, the repository size would increase by 20MB), so zip'ing the files saved us a lot of space. Plus SOS didn't do a very good job of over-the-wire compression, so a zip'd MDB would download a lot faster then storing the MDB

A LISP programmer knows the value of everything, but the cost of nothing. -- Alan Perlis

Working...