Performance Tuning Subversion 200
BlueVoodoo writes "Subversion is one of the few version control systems that can store binary files using a delta algorithm. In this article, senior developer David Bell explains why Subversion's performance suffers when handling binaries and suggests several ways to work around the problem."
Re:Why binaries? (Score:4, Informative)
Re:Why binaries? (Score:4, Informative)
Re:Why binaries? (Score:5, Informative)
Re:SVN will not replace CVS (IMO) (Score:5, Informative)
HINT: When you do it the way CVS provides, you will lose all of your revision history.
SVN does not have this fatal flaw.
svn+ssh and master mode ssh. (Score:5, Informative)
Plus if the master connection is set to compress data ( -C ) , then you get transparent compression.
Now if only I could expand all this to fit 2 pages....Profit!!!
Re:SVN will not replace CVS (IMO) (Score:1, Informative)
HINT: When you do it the way CVS provides, you will lose all of your revision history.
SVN does not have this fatal flaw.
What the hell are you talking about? You just log into the CVS server and move the directory/file in the repository.
Having to write config files by hand to route around non-existant symbolic links support on a platform that does support symbolic links is what I call a "fatal flaw".
If SVN is so great... why is the majority not using it? It's not like it is entirely new.
I can tell you why. Because developers are still angry with that wet-script-kiddie-dream-called-autoconf it selfimportant complaints about M4-here and can't-find-AC_blablabla-there. They don't want to run into the next selfimportant barrier on their way to actually get their project done. CVS just WORKS! For many years now. And if you have problems moving files/directories because your project is hosted on SF then that's the consequence of your choice and not CVS's fault.
But maybe it's more about configuring the projects development environment these days than getting work done.
Re:performance not the biggest problem (Score:4, Informative)
I think you missed his point... he'd committed all his changes. The problem is that if you merge a file or directory deletion in, where that file or directory had modifications committed, Subversion won't tell you about the conflict, but will delete the file or directory including the new modifications.
You wanted to delete it, so who cares, right?
Subversion represents renames as a copy & delete. So now, you rename a file or directory, and do the same dance as above, and the renamed file or directory does not have changes that were made on trunk under their previous names. So renaming a file can re-introduce a bug you already fixed.
No big deal, the devs will fix it soon, right? Wrong [tigris.org] and wrong again [tigris.org].
That is the problem.
More about tuning your processes (Score:3, Informative)
Re:Why binaries? (Score:1, Informative)
Version Control is for when you can actually see a difference in versions.
If you have jars checked into CVS / SVN you should move to using something like Maven so you can store your internal jars on a web server.
Re:SVN will not replace CVS (IMO) (Score:5, Informative)
For many open source projects, finding good documentation is hard. In the case of Subversion, it couldn't be easier. In fact, the Subversion team has taken documentation to such a level that they should be considered THE model for documentation in the open source community. They have written a book (published in print by O'Reilly, but maintained and posted for free by them on the Internet) that documents their system, and it is very good. My job at the last company I worked for was to write wizards for the Eclipse platform that would automate several of the most common tasks that a Subversion user would try to do, and that book was the only reference I needed. You can find the book on their site here: http://svnbook.red-bean.com/ [red-bean.com] . They even do nightly builds of the book, so not only is their documentation complete and useful, it is also incredibly thorough and up to date.
If anyone on here hasn't read it, DO IT, because the first half will teach you why you want Subversion rather than CVS or some other alternative, and how to use it and how to get the most out of it (second half is lower level stuff you may not care about). It even includes best practices. Once you really learn how to use Subversion, you won't want to use anything else. And this is the way to get started.
Re:performance not the biggest problem (Score:3, Informative)
What if the merges are done by someone who isn't familiar with all the code changes and the expected associated application behaviors? What if there are dozens or even hundreds of code changes in a branch being merged to trunk? What if your QA work is being done by people who are not developers and who have no involvement in the merge process?
These are not just hypothetical issues. I work on a team which espouses the agile methodology, and many times we've missed bug fixes in merges because of the way Subversion treats moves (copy + delete instead of truly changing the parent directory of a given file), or because Subversion's merge facility got confused (especially when changes were made both to the branch and trunk versions of a file).
Recently, I was put in charge of merging a branch to the trunk for my team's project, and discovered that some methods were duplicated because one of our programmers had deleted the original version of a given method, then pasted in a completely different implementation into a different location in the same source file. It was easy enough to catch this with Java classes (since they won't compile correctly if you have two instances of the same method signature in the same class), but JavaScript was a slightly different story...
Developers will not do these workarounds (Score:4, Informative)
Doing so means you have to unzip them to use them. Not very handy. Most users want to use Subversion the way they should be able to use version control- a checkout should give you all of the files you need to work with on a given project, with minimal need to move/install pieces after checkout. Implementing the 'best' suggested workaround would mean needing a script or other way to get the binaries unpacked. Programmers are often annoyed enough by the extra step of *using* version control, now you have to zip any binaries you commit to the repository?
I'm unimpressed by their performance testing methodology... they give shared server and desktop performance numbers, but have no idea what 'else' those machines were doing? Pointless. I'd like more details regarding what they're doing in their testing. Their tests were done with a "directory tree of binary files", but don't say what size or how many files?
My tests on our server show a 28MB binary checkout ( LAN, SPARC server, Pentium M client ) takes ~20 seconds. Export takes ~2sec. That must be a big set of files to cause a 9 minute *export*... several gigs, am I wrong? It'd be nice for them to say. Most of us, even in a worst case, won't have more than a few hundred MB in a single project.
The only *real* solution will be a Subversion configuration option which lets you say "please, use all my disk space, speed is all I care about when it comes to binary files". CollabNet is focused enough on getting big-business support contracts that it shouldn't be long before we see this issue addressed in one manner or another. You -know- they're reading this article!