Performance Tuning Subversion 200
BlueVoodoo writes "Subversion is one of the few version control systems that can store binary files using a delta algorithm. In this article, senior developer David Bell explains why Subversion's performance suffers when handling binaries and suggests several ways to work around the problem."
Why binaries? (Score:2, Interesting)
Re:Why binaries? (Score:4, Informative)
Re:Why binaries? (Score:5, Interesting)
We give our outside designers access to their own SVN repository. When we contract out a design (for a brochure, for instance), I give them the SVN checkout path for the project, along with a user name and password. They don't get paid until they commit the final version along with a matching PDF proof.
This solves several issues:
(a) The tendency for design studios to withhold original artwork. Most of them do this to ensure you have to keep coming back to them like lost puppies needing the next bowl of food. It also eliminates the "I e-mailed it to you already!" argument, removes insecure FTP transfers, and can automatically notify interested parties upon checkin. No checkin? No pay. Period.
(b) Printers have to check out the file themselves using svn. They have no excuse to print a wrong file, and you can have a complete log to cross-check their work. They said it's printing? Look at the checkout/export log and see if they actually downloaded the artwork and how long ago.
(c) The lack of accountability via e-mail and phone. We use Trac in the same setup, so all artwork change requests MUST go through Trac. No detailed ticket? No change.
(d) Keeps all files under one system that is easy to back up.
You may have a little difficulty finding someone at both the design and print companies that can do this, but a 1 page Word document seems to do the trick just fine.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
You can also set up your own authentication mechanism through Apache. We use Django, for instance, which then logs that at least they
Re: (Score:2, Insightful)
Re:Why binaries? (Score:4, Informative)
Re:Why binaries? (Score:5, Informative)
running the toolchain... (Score:4, Insightful)
I suspect the answer (if you really need it) is to save a 'Virtual PC' image of the machine that does the build each time you make an important baseline (or each time the build machine configuration changes). Since the image is likely to be in the GB size range, you might want to store it on a DVD rather than in your CM system.
Re: (Score:2)
Don't forget, the VM is useless without the hypervisor/player/whatever, so you need to check that in too. Of course, that's generally useless without the OS, so check that in too. Even if you have an OS/hypervisor, that's useless without the hardware, so you need to check that in too.
Or, rather than trying to figure out how to version control hardware, you could write portable code and use open standards, and not worry about all mess.
Re: (Score:2)
Eventually, the OS updates but the tool chain updates with it. Release Management is about handling the NOW and not the whatif. Especially since stable and mature OSs don't really change that much, and aren't a toolchain dependency (unlike you RedHat people).
You have some other granular, fault tolerant, and centralized release tracking model that works better and doesn't rely on different directories,
Re: (Score:2)
Instant traceable recovery of the tool-chain especially when spread out between multiple developer over long periods of time as long as the core dependencies don't change.
Let me give you an example: developers have been building this game with tools and related artwork. Once the release A has been made, all sources and binaries are checked-in. A year later, developers come back to localize this release for international release, but meanwhile the source and tool chain had mov
Re:Why binaries? (Score:5, Insightful)
That way it's all in one place and easily backed up. If you get a new version of the DLL/jar/so you can just drop it into a new branch for testing. If your customer won't upgrade from version 2.2 to version 3.0, you can recreate the entire product to help fix bugs in the old version rather than just saying, "We've lost it, you've got to upgrade."
Basically, by putting your entire project under version control, you know that it's all in one place, no matter what version it is you want. Even if the files don't change, you know how to reconstruct a development installation without having to dig around in multiple locations (source in version control, DLLs in one directory on the server, etc.)
Yeah, so it costs some extra disk to store it. Disk is cheap.
Re: (Score:3, Interesting)
Re: (Score:2)
Our solution is that the main trunk is always kept up-to-date on the workstations. I have a script that runs at 4am each morning that does a "svn up" on that part of the repository tree. When I'm ready to use my l
Re:Why binaries? (Score:5, Insightful)
2) you have proprietary build tools limited to developer use, or release engineers unable to build for whatever reason ( similar to #1, I know... )
3) images, of course.
4) Word, Excel, other proprietary document formats are all binary.
5) third-party binary installation packages, patches, dynamic libs, tools, etc.
You're just not trying, or you're thinking of version control as something that only programmers would use, and that they'd only use it to store their text source. There are as many reasons to store binary files in version control as there are reasons to have binary files...
Re: (Score:2)
I was trying, but as a hobby programmer I used it exactly as you described. I use it to store my scripts, source code and documentation.
Exactly... all you need for hobby projects. You don't have to worry about someone else needing your binaries. You don't have lots of images or proprietary-format documents ( some of which can get really large... think of a database as a document... ), and you don't even have to worry about someone being able to build your project w/o this dynamic library or that compiler or this other tool.
We have to worry about all of that stuff, and want every aspect of it to be in version control so we have a record
Re: (Score:2)
I don't understand why the author of the article wants to do what he is, but lots of people have good (or good enough) reasons for wanting to track binary files.
I always hope I don't have to keep binaries in svn, but since so many people seem to love
Re: (Score:3, Interesting)
Re: (Score:2)
Thanks
Re: (Score:2)
Re:Why binaries? (Score:4, Interesting)
And yes, for a 250mb audio file, it is VERY slow.
Re: (Score:2)
I know it can handle binaries, but I cannot think why I would want to. Can anyone help?
I use Subversion for my FPGA sources. When I go to release a version of a design, I include the binary build results (.bit files from the place and route tools, and the .mcs file used to program the config EPROM) in my release tag. This is so checksums and such match exactly, and this is important because the production people put stickers on the EPROMs that display the version, part number and checksum. While I can certainly rebuild from the source, if the toolchain changes then the resulting bit file
Re:Why binaries? (Score:5, Insightful)
Re: (Score:2)
Re:Why binaries? (Score:5, Insightful)
But taking a quick look at the article, I get an idea - storing your binaries at different version levels w/ it. Say I am developing a software package, us SVN for each level of revisions. With major releases I could store the produced binaries with the package to prevent the need to recompile when I am pulling down a version. Basically it would truly version control your binaries as well.
In some ways the article makes me wish I did that with the project I am currently working on. I might start doing it now.
-R
Re: (Score:2)
Re: (Score:2)
Maybe but then you may want to have a tool to automate adding new versions, then you would want something to automate comparing versions, if you have limited bandwidth it would be useful to send only deltas, specially with big files. To make it short, you end with a Subversion clone. Because Subve
Re: (Score:2)
Re: (Score:2)
For most simple projects its easier to just commit the jars within the project. That way one can just checkout and compile. With your way, where the programme
performance not the biggest problem (Score:3, Interesting)
more that you lose changes without any warning or whatsoever during merging
(don't get me wrong, i love subversion
Re: (Score:2)
Re:performance not the biggest problem (Score:4, Informative)
I think you missed his point... he'd committed all his changes. The problem is that if you merge a file or directory deletion in, where that file or directory had modifications committed, Subversion won't tell you about the conflict, but will delete the file or directory including the new modifications.
You wanted to delete it, so who cares, right?
Subversion represents renames as a copy & delete. So now, you rename a file or directory, and do the same dance as above, and the renamed file or directory does not have changes that were made on trunk under their previous names. So renaming a file can re-introduce a bug you already fixed.
No big deal, the devs will fix it soon, right? Wrong [tigris.org] and wrong again [tigris.org].
That is the problem.
Re: (Score:2)
Re: (Score:2)
Reintroducing a bug is a very bad thing. And if you've only worked on projects with 100% test coverage, and automated execution of said tests, you're going to be in for a real rude awakening when you get a job.
Um... sorry, let me set this flamethrower down here, turn it off, and I'll just
Re: (Score:2)
Re: (Score:3, Informative)
What if the merges are done by someone who isn't familiar with all the code changes and the expected associated application behaviors? What if there are dozens or even hundreds of code changes in a branch being merged to trunk? What if your QA work is being done by people who are not developers and who have no involvement in the merge process?
These are not just hypothetical issues. I
svn+ssh and master mode ssh. (Score:5, Informative)
Plus if the master connection is set to compress data ( -C ) , then you get transparent compression.
Now if only I could expand all this to fit 2 pages....Profit!!!
Re: (Score:2)
git (Score:2)
It may have performance problems, but... (Score:5, Interesting)
Re:It may have performance problems, but... (Score:4, Interesting)
I suppose one has to be conservative with deployment of this stuff, you don't want to have code locked away in unmantained software, or erased by immaturity bugs, but it's still an interesting field.
Re: (Score:2)
It may or may not be the wave of the future, but after looking at version control systems for almost 2 years before switching to SVN last year,
Re: (Score:2, Insightful)
There are still some reasons for choosing SVN over monotone though, the major one for me is partial checkout, which you learn to appreciate once you've been stuck behind dialup or on a cell phone. (On the other hand, SVN doesn't do complete checkouts.)
Peopl
Re: (Score:2)
Also, SVN doesn't -have- a way to rollback.
Re: (Score:2)
Re: (Score:2)
All told, our total repository space is 20GB spread over about 2 dozen repositories (the 3GB one is the largest and where most work occurs).
It's working exceeding well for us. We
Store them differently (Score:4, Interesting)
What I don't like about this article is that it implies I should have to restructure my development environment to deal with a flaw in my version control. The binary issue is huge with subverison, but most of the people working on subversion don't use binary storage as much as game projects. Subversion should have an option to store the head as a full file, not a delta, and this problem would be solved. True, it would slowdown the commit time, but commits happen a lot less than updates (at least for us). Also the re-delta-ing of the head-1 revision could happen on the server in the background, keeping commits fast.
Re: (Score:3, Interesting)
It's like when I added 2,457 files to a VLC play list. It took 55 minutes to complete the operation. I immediatly downloaded the VLC code, and went looking through it...
It loops, while(1), through a piece of code that is commented "/* Change this, i
Re: (Score:2)
I wish it were so simple. They moved from a dual 500MHz, 500MB RAM machine, shared amongst tasks, to a 3.2GHz 2GB RAM machine solely doing SVN. That's no small upgrade, and isn't at all telling which of the three main variables (CPU, RAM, shared-or-n
Re: (Score:2)
What's wrong with version control? (Score:5, Interesting)
I mean, is it just me or is revision control software incredibly difficult to use? To put this into context, I've developed software that builds websites with integrated shopping cart, dozens of business features, email integration, domain name, integration, over 100,000 sites built with it, (blah blah blah) but I find revision control HARD.
It feels to me like there is a fundamentally easier way to do revision control. But, I haven't found it yet or know if it exists.
I guess for people coming from CVS, Subversion is easier. But with subversion, I just found it disgusting (and hard to manage) how it left all these invisible files all over my system and if I copied a directory, for example, there would be two copies linked to the same place in the repository. Also, some actions that I do directly to the files are very difficult to reconcile with the repository.
Since then, I've switched our development team to Perforce (which I like much better), but we still spend too much time on version control issues. With the number, speed of rollouts and need for easy accessibility to certain types of rollbacks (but not others), we are unusual. In fact, we ended up using a layout that hasn't been documented before but works well for us. That said, I still find version control hard.
Am I alone? Are there better solutions (open source or paid?) that you've found? I'd like to hear.
Re:What's wrong with version control? (Score:4, Insightful)
Fiddling with stuff you are not supposed to fiddle with is generally a no-no when using source control. I found though that I got used to the Subversion way to do things (learned that the hard way). For example Subversion on the client side does not really handle server side rollbacks of the complete repository since the files are cached and hashed locally. One way to make source control more transparent to the user could be to let the filesystem handle it.
Re: (Score:3, Interesting)
1) You want to make a copy of trunk to send to somebody:
tar cvf project.tar
With svn you have to go through a bunch of magic to do this or you end up giving them an original copy when you may have local changes (you tweaked some config option or whatever), your username, time svn repo address and structrure, etc. If yo
Re: (Score:2)
1) You want to make a copy of trunk to send to somebody:
tar cvf project.tar .
With svn you have to go through a bunch of magic to do this or you end up giving them an original copy when you may have local changes (you tweaked some config option or whatever), your username, time svn repo address and structrure, etc. If you do svn export it makes a copy of what is in HEAD not in your folder, so there is no way to do this without going back and weeding out this junk
In any VC/CM system, you should not exp
Re: (Score:2)
Per project? The book covers that:
http://svnbook.red-bean.com/nightly/en/svn.branchm erge.maint.html#svn.branchmerge.maint.layout [red-bean.com]
Re: (Score:2)
No mucking with database files, no mucking with rollback points, so on.
Re: (Score:2)
No atomic commits, no renames, no http, no delta of binary files, no safe handling of binary files...
Re: (Score:2)
Re: (Score:2)
CVS doesn't handle renames, you have to do it manually with add and delete and there is no link from the new file to the old. SVN isn't perfect (rename = copy + delete) but at least supports copy preserving
Re: (Score:3, Interesting)
Re: (Score:3, Insightful)
If it isn't about doing your work, then why do you do it?
Of course it is about doing your job. If you're a programmer, it's analogous to asking your C compiler not to suppress warnings. You would have to find those bugs anyway, and you would do a much worse job without the help.
In my work, version control (or whatever fancy name ending in "management"
Re: (Score:2)
I thought what I meant was clear. I never intended to claim that VC was useless, non-productive, or non-work. In fact, part of my point was that it is work, which many people don't understand. By "not doing your work" I meant simply that your final product is a program, not a tree of revisions. Time spent on VC does not directly result in satisfying customer needs, rather it makes it easier to create reliable software more quickly, and with less r
What do you find hard? (Score:2)
where it becomes complicated is when you start talking about branching, merging, or trying to deal with dependencies across projects, etc.
But if done well, version control helps more than hurts.
Re: (Score:3, Interesting)
Re: (Score:2)
SVN actually works pretty well at storing log files (it's very efficient at it - both storing and sending the changes). Especially for distribution of said log files and secure storage of the log files. Because SVN doesn't support purge, it makes a good WORM-style solution for logs. And with svn+ssh and restricting the commands that can be run with a particular SSH key make it fairly secure from tampering.
Yeah, it's probably overkill. But it's a leg
Re: (Score:2)
It only puts them in your working copy. Most development practices include the assumption that you wouldn't deploy your working copy simply by copying it directly. There are several models of how to generate something to be deployed. One of the most common ones is to have a script or build tool that operates on the working copy and generates something that can be deployed. That
Yes, use Mercurial or another distributed tool! (Score:2)
My work became a lo
Subversion is Sex (Score:2)
Re: (Score:2)
Version control is part of the software development process.
If you are building a simple program on your own, then the basic thing to do with it is versioning in a straight line.
However, if your program architecture becomes more complex and/or more people are working on it, then version control becomes synchronisation system.
When you have a more elaborate development process, version control is tied in with change control and tracking.
So, yes, version control is hard.
I am a fulltime VC administrator fo
Re: (Score:2)
That's actually a very strong advantage of SVN. Working copies do not have to map 1:1 with repositories. But it was a big change compared to how Visual SourceSafe wor
More about tuning your processes (Score:3, Informative)
Developers will not do these workarounds (Score:4, Informative)
Doing so means you have to unzip them to use them. Not very handy. Most users want to use Subversion the way they should be able to use version control- a checkout should give you all of the files you need to work with on a given project, with minimal need to move/install pieces after checkout. Implementing the 'best' suggested workaround would mean needing a script or other way to get the binaries unpacked. Programmers are often annoyed enough by the extra step of *using* version control, now you have to zip any binaries you commit to the repository?
I'm unimpressed by their performance testing methodology... they give shared server and desktop performance numbers, but have no idea what 'else' those machines were doing? Pointless. I'd like more details regarding what they're doing in their testing. Their tests were done with a "directory tree of binary files", but don't say what size or how many files?
My tests on our server show a 28MB binary checkout ( LAN, SPARC server, Pentium M client ) takes ~20 seconds. Export takes ~2sec. That must be a big set of files to cause a 9 minute *export*... several gigs, am I wrong? It'd be nice for them to say. Most of us, even in a worst case, won't have more than a few hundred MB in a single project.
The only *real* solution will be a Subversion configuration option which lets you say "please, use all my disk space, speed is all I care about when it comes to binary files". CollabNet is focused enough on getting big-business support contracts that it shouldn't be long before we see this issue addressed in one manner or another. You -know- they're reading this article!
Re: (Score:2)
Re: (Score:2)
I think this is the case for all VC systems.
I once did a test between Continuus and Subversion for checking out the same tree. The result where the same. Why ? Not because of the version control system, but because of the speed of bringing updates over the network to the disk. Creating files and directories is expensive.
In my automated builds (using Continuus), reconfigures (updates) take from 5 to 20 minutes, depending upon the size of the tree and the amount of changes done. With big changes, add 5 to 1
Re: (Score:2)
You *really* need to examine your setup to find out why that is happening.
A 300 MB checkout (maybe a few dozen files) should only take about a minute to prep and then however long it takes to move over the wire. Since we're using svn+ssh, things are extremely efficient (SSH pub-keys restricted to running the svnserver tool in tunnel mode combined with PuTTY and TortoiseSVN). Our largest reposi
Vesta is better (Score:3, Interesting)
The first time I used Vesta, it was a life-changing experience. It's nice to see something that isn't a rehash of the 1960s
Notice.. (Score:3, Interesting)
Questions that remain:
1. Does the algorithm simply "plainly store" previously-compressed files, and is this the reason why that is the most time-efficient?
2. What exactly was the data for the *actual check-in* times? (What took 28m? What took 13m?)
3. Given that speedier/efficient check-in requires a large tarball format, how are artists supposed to incorporate this into their standard workflow? (Sure, there's a script for check-in, but the article is absent any details about actually using or checking-out the files thus stored except to say it's an unresolved problem regarding browsing files so stored.)
The amount of CPU required for binary diff calculation is pretty significant. For an artistic team that generates large volumes of binary data (much of it in the form of mpeg streams, large lossy-compressed jpeg files, and so forth) it would be interesting to find out what kind of gains a binary diff would provide, if any.
Document storage would also be an interesting and fairer test. Isn't
Questions, questions (Score:2)
Re: (Score:2)
Re:SVN will not replace CVS (IMO) (Score:5, Informative)
HINT: When you do it the way CVS provides, you will lose all of your revision history.
SVN does not have this fatal flaw.
Re:SVN will not replace CVS (IMO) (Score:4, Interesting)
SVN does not have obliterate (Score:2)
SVN will never beat CVS on space-efficiency in the long run.
SVN does not have granular history obliterate whereas CVS/Perforce does, so CVS might be bigger initially, but you can always delete very old versions. These old binary versions are the ones you can rebuild from source or you really don't care anymore.
It exists forever in SVN.
Re: (Score:2)
Which is both a blessing and a curse. In CVS (or even VSS which had a "destroy" option), a rogue developer could kill off large amounts of project history which might go unnoticed for long periods. In SVN, it's simply not possible. In VSS, we had an extremely limited subset of people who were allowed to "destroy".
Our old VSS server's repository was around 20-30GB after close to 5 years of use. Lots and lots of binary files in there, which ate up tremendous amounts of disk s
Re: (Score:2)
However, SVN does have soem disadvantages (and some say these are so bad its not worth using it). SVN only manages whole directories - you cannot operate on a single file, try checking out a single file and you'll find you cannot. SVN also ha
Re: (Score:2)
On the upside, it's being actively developed and some of those issues are being addressed (others are central design themes which will probably not change until v2.0 or v3.0, if ever).
(I came from the VSS world... where the tool was dead and no longer under active development. I much prefer a tool that is still in active development. Even if it has some qu
Correction (Score:2)
s/lose/have to check another file for/
Yes, I'm working on a 100k-line project (in CVS) that's undergone significant directory restructuring, and no, I've never found this to be a problem. If anything was to push me to Subversion, it'd be the fact that the CVS logs are split
Re: (Score:2)
Not sure if you're talking about server-side (the repository) or client-side (the working copy).
On the server-side, you have a choice of either FS or BDB for storing the repository. I prefer FS. There's also the SVN mirror scripts and other backup options. The FS format stores each revision
CVS lacks useful features (Score:2)
Frankly CVS just doesn't cut it for me. It lacks too many features.
1) Atomic checkins/submits
I am trying to submit changes in 5 files as a single bugfix.
A submit/checkin should either succeed for all 5 or fail for all 5.
CVS doesn't do this. The end result is that I may end up submitting
a change in the header without submitting a correspond change in the
implementation file.
2) Changelists
After checking in multiples files
Re: (Score:3, Insightful)
If SVN is so great... why is the majority not using it? It's not like it is entirely new.
Momentum for the most part. CVS is good enough 95% of the time, so it takes some reason to change over. I've recently started using svn after using cvs for years. I'm still not as familiar with svn as I am with CVS.
Personally I don't really like the different branching/tagging behavior in subversion, but I also think I just don't know it as well. Someday I'll have to find some decent documentation on how to use it p
Re:SVN will not replace CVS (IMO) (Score:5, Informative)
For many open source projects, finding good documentation is hard. In the case of Subversion, it couldn't be easier. In fact, the Subversion team has taken documentation to such a level that they should be considered THE model for documentation in the open source community. They have written a book (published in print by O'Reilly, but maintained and posted for free by them on the Internet) that documents their system, and it is very good. My job at the last company I worked for was to write wizards for the Eclipse platform that would automate several of the most common tasks that a Subversion user would try to do, and that book was the only reference I needed. You can find the book on their site here: http://svnbook.red-bean.com/ [red-bean.com] . They even do nightly builds of the book, so not only is their documentation complete and useful, it is also incredibly thorough and up to date.
If anyone on here hasn't read it, DO IT, because the first half will teach you why you want Subversion rather than CVS or some other alternative, and how to use it and how to get the most out of it (second half is lower level stuff you may not care about). It even includes best practices. Once you really learn how to use Subversion, you won't want to use anything else. And this is the way to get started.
Re:SVN will not replace CVS (IMO) (Score:4, Insightful)
I am an SVN newbie, but that kinda sounds like Externals [red-bean.com].
Re: (Score:3, Interesting)
Re: (Score:3, Insightful)
The only reason I'd ever choose native Subversion over a newer system like git or Mercurial is if I needed some tool that had builtin Subversion integration and didn't support anything else. Absent that criterion, IMO if you choose Subversion it's a sign you don't really understand version control too well.
What if you have a bunch of developers working with some ( unfortunately, let me say that ) Windows-only tools for historical reasons ? Are you really saying that I should have a team of VisualStudio users install cygwin on their systems ?
git is great for Linux kernel developers, but 'install this massive compatibility layer to use this product' will fail to make you a lot of friends, especially in a Windows-friendly corporate environment. I say that as an avid, daily CygWin user and longtime Windows h
Re: (Score:2)
I think you're missing the point - people want working version control, e
Re: (Score:2)
Of interest, I have worked for 4 different companies in my programming career. All worked on Windows systems, because - well, until very recently, a Linux desktop was a pain in the butt, and Macs cost too much compared with a stripped down Dell. But every single company used Cygwin as part of the standard developper environment.
Congratulations on never having had to work in a _real_ Windows shop. I'd never touch a Windows machine without immediately installing CygWin myself, but 'real' windows developers buy into the Microsoft Way and use Visual Studio or some other MS tool and look at you funny if you talk about anything else.
Re: (Score:2)
Newer schematic and circuit board layout programs support svn directly (via svn.exe on windows) - Would there be a git-svn.exe to replace the svn.exe with the same command line set?
--jeffk++
Re: (Score:2)
The other people I am working with are game level builders, not code geeks. So, they've never used anything like this.
Re: (Score:2)