Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Tom Lord's Decentralized Revision Control System 300

Bruce Perens writes: "He'll have to change its name, but Tom Lord's arch revision control system is revolutionary. Where CVS is a cathedral, 'arch' is a bazaar, with the ability for branches to live on separate servers from the main trunk of the project's development. Thus, you can create a branch without the authority, or even the cooperation, of the managers of the main tree. A global name-space makes all revision archives worldwide appear as if they are the same repository. Using this system, most of what we do using 'patch' today would go away -- we'd just choose, or merge, branches. Much of the synchronization problem we have with patches is handled by tools that eliminate and/or manage conflicts -- they solve some of the thorny graph topology issues around patch management. Arch also poses its own answer to the 'Linus Doesn't Scale' problem. This is well worth checking out." If you're asking "What about subversion?", well, so is Tom.
This discussion has been archived. No new comments can be posted.

Tom Lord's Decentralized Revision Control System

Comments Filter:
  • why FTP? (Score:5, Insightful)

    by devphil ( 51341 ) on Tuesday February 05, 2002 @06:22PM (#2958465) Homepage

    I guess I'm wondering why arch uses FTP as its network protocol. The FAQ says that it should be workable behind firewalls since the data is all transferred in passive mode, but this still seems like a huge step backwards.

    So, what am I missing? I only got to read a little bit of the site before it got DDOS'd by slashdot.

  • And others (Score:4, Insightful)

    by Ed Avis ( 5917 ) <ed@membled.com> on Tuesday February 05, 2002 @06:23PM (#2958470) Homepage
    Not only 'what about Subversion' but also 'what about CVS, what about Aegis'. If you include non-free systems then what about Perforce or Bitkeeper.

    This is getting worse than journalling filesystems :-(.
  • by Sludge ( 1234 ) <[gro.dessot] [ta] [todhsals]> on Tuesday February 05, 2002 @06:26PM (#2958486) Homepage
    That sounds like hype. In the real world, selecting the aspects of software we want to compile from on remote sites would have serious implications. The first being security. The second being quality. Linus may not scale, but he has good judgement. That's the fundamental problem.
  • Re:why FTP? (Score:4, Insightful)

    by Anonymous Coward on Tuesday February 05, 2002 @06:30PM (#2958518)
    I guess I'm wondering why arch uses FTP as its network protocol.
    It's because this "Decentralized Revision Control System" is just a guise for a p2p filesharing. It's really cool: you check in all your files and they automatically get replicated, having become part of the "master tree". No one can shut down the master tree. No one can tell you not to put your files there. (Hey, it's part of my project!)
    Slick.
  • Re:why FTP? (Score:2, Insightful)

    by FlowerPotAdmin ( 541227 ) <jmk63&cornell,edu> on Tuesday February 05, 2002 @06:42PM (#2958576) Homepage
    I guess I'm wondering why arch uses FTP as its network protocol.

    Well, the article mentioned that arch consisted of a bunch of shell scripts and some C code, so it looks like ftp was just an "off the shelf" component that the author could make good use of.
  • Re:And others (Score:2, Insightful)

    by FlowerPotAdmin ( 541227 ) <jmk63&cornell,edu> on Tuesday February 05, 2002 @06:47PM (#2958607) Homepage
    If you include non-free systems...

    Unfortunately, for some people/projects that's not a n option.

    This is getting worse than journalling filesystems :-(

    I can see how you would feel this way, but keep in mind, a healthy number of different implementation ideas and design philosophies can only hasten the development of open source tools.
  • by mikemulvaney ( 24879 ) on Tuesday February 05, 2002 @06:51PM (#2958629)
    It sounds like it has a lot of nice features, but then you realize the whole thing is written in sh? One of the nice things about CVS is that the client-server nature allows someone to use pretty much any operating system as a client. Subversion takes this to the next step, by making all connections use the client-server model.

    Forcing everyone to use sh is a major hassle. I know that it would work with any "reasonably POSIX" OS, but then developers can't get arch accessibility built into their favorite tools, like NetBeans or whatever.

    Creating local branches is pretty cool, though.

    Mike
  • Re:er... (Score:2, Insightful)

    by FlowerPotAdmin ( 541227 ) <jmk63&cornell,edu> on Tuesday February 05, 2002 @06:54PM (#2958644) Homepage
    So, if I develop something useful, it would be called edu.cornell.resnet.jmk63.widget? A little unwieldy, methinks. I suppose it works as a unique identifier, but what if I graduate and I no longer have control of that location? Back to the arch example, if merging code trees is done without a full copy, anyone who is patching against my code tree (or even anyone patching against them) is out of luck.
  • by JabberWokky ( 19442 ) <slashdot.com@timewarp.org> on Tuesday February 05, 2002 @06:56PM (#2958664) Homepage Journal
    And believe me, if The Beast walks by jumping two just steps forward, two feet together, then one just step back, and keeps repeating it, it will get FARTHER toward the mountain than a person doing stunts on a bicycle, with no real idea of where he or she's going, but looking damn fine doing it.

    Right - and along with that guy doing stunts are three hundred thousand others trying to get to the mountain... some on bikes doing tricks, some with their heads down and pedaling in a direction (it might even be a "wrong" direction, and they find a new mountain), and they might be driving a HumVee, Porche or F-15.

    If you look at the recent (and ongoing) Limux VM fight, it looks the exact same as any inhouse coder fight - the exact sort of thing Microsoft encourages (there are often two parallel projects working towards the same goal). The only difference is that the OS designs don't have to be killed by bugetary contraints... they go on until there is a clear winner. And there are branches of code (like GUI design) where there *is* no clear "better" solution. That's where you get many parallel projects, all just getting better and better or heading for different niches (Blackbox vs. KDE for example).

    --
    Evan

  • Re:why FTP? (Score:5, Insightful)

    by curunir ( 98273 ) on Tuesday February 05, 2002 @06:59PM (#2958692) Homepage Journal
    hmm...

    wouldn't rsync over ssh have been a much better choice for an "off the shelf" component? Most ftp servers tend to have a few (read: waaaaay tooooo maaaany) security concerns for my taste.
  • I smell trouble (Score:4, Insightful)

    by heretic108 ( 454817 ) on Tuesday February 05, 2002 @07:04PM (#2958732)
    From the article, it looks good.
    But let me say that I've sometimes been in the position of having to merge branches. In my first hacking job, I had to take code that had been written by 2 crazy Polish programmers, and merge 37 non-working branches into one branch that worked. It was *not* fun, and I enjoyed a well-deserved beer when it was done.
    IMO, a distributed system of archive management that doesn't make ongoing reference to a central tree is a sure recipe for chaos, and poses the risk of making software harder to install/use for the non-skilled, and creating a lot of work in merging disparate branches for the skilled.

    You want package xxyzz? OK - go to Jim's store in San Diego. It's easy to set up. Oh, I forgot to tell you, you've gotta get some bits from Lucy's store in Manchester, and Frieda's fixed a few bugs too - get her fixes from Bonn. And don't forget Peter's enhancements - his store is at the Adelaide University site. What? it doesn't compile? What kind of idiot are you? Just hack it till it does compile, then put it together in your own tree!
  • by e40 ( 448424 ) on Tuesday February 05, 2002 @07:06PM (#2958747) Journal
    It is an important feature of subversion that it will be CVS compatible. I manage a 10+ year old/1+GB CVS repository. CVS has a lot of faults, but I can't throw that version history away. It's too valuable. subversion gives me hope that I'll get something more usable than CVS (we'll see, won't we!) without much pain.

    I'm really hoping the subversion developers succeed.

    Having said that, I'm all for arch succeeding too. Perhaps it will be better for new projects. Who knows.
  • Re:why FTP? (Score:3, Insightful)

    by GigsVT ( 208848 ) on Tuesday February 05, 2002 @07:13PM (#2958801) Journal
    or even scp for that matter...
  • by markj02 ( 544487 ) on Tuesday February 05, 2002 @07:20PM (#2958854)
    The feature list sounds nice, and using the file system in the way it does is also pretty nice. But I just can't deal with 40kloc of shell script for a version control system. How am I supposed to run that sort of system on a non-UNIX system? What kinds of oddball dependencies is it going to have on the shell, path, and environment?

    This seems like it's worse than CVS. Functionally, I'm quite happy with CVS. The main complaint I have about it is that it isn't self-contained but invokes rcs and other shell commands in mysterious ways. "arch" seems to make things worse, not better in that regard. What I would like to see is something mostly like CVS, but something that is implemented as a clean, self-contained library with a single command line executable (with subcommands) and a built-in HTTP-based server. Until that comes along, I think I'll just stick with CVS.

  • by devphil ( 51341 ) on Tuesday February 05, 2002 @07:24PM (#2958875) Homepage

    Well, flowerpot, now I'm wondering whether arch uses the ftp programs, or just the ftp protocol. That is, do you need an ftp client or server installed for arch to work? From what I've seen it wouldn't be too hard to do the protocol yourself.

    I still can't get to the site, so oh well.

  • by LarryRiedel ( 141315 ) on Tuesday February 05, 2002 @07:43PM (#2958999)

    I think a major test of this or any other successor to CVS should be how amenable is the design to alternative implementations which integrate seamlessly with the reference implementation.

    I think the fact that the "arch" solution is designed to be so simple and clean that it can be implemented with a few shell scripts bodes well for it.

    I would expect it to be pretty easy to integrate the "arch" solution into lots of other tools by writing a little code which manipulates the files the same way the "arch" shell scripts do.

  • Dialup? (Score:2, Insightful)

    by gouldtj ( 21635 ) on Tuesday February 05, 2002 @07:50PM (#2959030) Homepage Journal
    Maybe I don't quite get it...

    Let's say that I don't have write access to the Linux kernel tree. So I go grab a copy and make a branch on my machine and fix it. So then I post to the kernel mailing list saying that I've fixed this bug. Linus gets all excited and want so merge my branch in, but he can't because I am offline. So he forgets, and nothing happens.

    Now you could say that I could upload it to the central server, but I don't have write access to that. I wouldn't imagine that they would give me (a non-kernel developer, trust me, I'd break something) access to the tree.

    I guess I just don't get how useful this will be.

  • by wls ( 95790 ) on Tuesday February 05, 2002 @07:52PM (#2959042) Homepage
    I've done SCM for a number of years, professionally evaluated version control product, and helped edit an Anti-Pattern book on the subject. It seems, at least to me, that the majority of version control systems out there have the basis covered when it comes to check-in, check-out, branching, and labeling. The standard features, if you will.

    However, most of the reasons that I've seen companies change version control systems is because of completely different reasons. Here are a few that come to mind:

    - A version control system must be fast. I worked at one company where we tried to use Visual SourceSafe over a WAN; it took HOURS to share code. A good VCS should transmit the minimal amount of data.

    - A version control system must provide security. All too often management uses the SCM repository as kind of a shared directory (BAD, BAD, BAD) -- and people who have no need to see or modify the code, do... implicitly.

    - A version control system should provide extensive auditing and notification capabilities that can be discretely turned on and off. Allow logging the positive, the negative, and letting people know when particular operations happen to a set of files. In once case we attempted to get PVCS to automate scripts on a change to send mail to the PM. Checking in a directory flooded inboxes, since it could audit collections of code.

    - There MUST be a recovery mechanism. Ever try to recover a lost SourceSafe password? Yikes. (Gaining re-entry is possible, back stuff up, change your password, do a diff. Copy pattern into the admin record with hex editor. Login as admin with new password. Change admin password. ...this worked at least twice for me.)

    - Again, there MUST be a recovery mechanism. I love RCS, SCCS, and PVCS for their file-related mechanisms. Why? I've had SCM systems go down hard when the database got munged. Yes, you can recover from a backup, but a lot of work gets lost. With an open file format, you can at least hand fix localized problems.

    - That said, good version control systems should allow you to check in collections of files as atomic units, move files and directories, and operate on projects as a whole. Anytime I have twiddle with a repository, thereby breaking past history, something is seriously wrong with the VCS system model.

    - Good systems must have an IMPORT / EXPORT capability that PRESERVES HISTORY. The less I feel locked into a solution, the more likely I'll be to try it out. Porting between system is usually painful.

    - SCM systems must conform to how the CM manager wants to run things, not the other way around. Let's face it, users can and will make mistakes, and that's okay. Mistakes should be fixable. I'll never use StarTeam because it was too easy for users to check in accidentally branches that couldn't be removed. Tech support argued that version control should reflect the history of the product, where I maintain (and still do) that it should reflect the intended history. If I want to include user errors, that should be my policy, not the tools. My users should be able to reflect upon the project history and know why things changed. Period. You don't use a hack to undo a mistake.

    - Branching notation should be clear and to the point. CVS has it's magic numbers, StarTeam has god awful views. Let me choose the numbering scheme, don't play games with odd/even numbering. Version numbers should not be overloaded to carry additional meta-information by the product.

    - A good SCM tool should remember tag history. Suppose I accidently move or delete a tag, now I want to put it back. Suppose I want to see where it's been. This case is rare, but anyone who's had a user twiddle with the wrong tags feels this pain as sharp and deep.

    - More ADMINISTRATIVE control. My big beef with CVS is when I have to twiddle with the repository structures and permissions directly to accomplish what I want done. No. No. No. There should be a tool (that audit's change) for standard operations.

    - An admin should have the ability to define, enforce, and audit user permissions that should be applied cross dimensionally against repository, commands, and elements within the repository.

    - Data should be stored in a manner that can be parsed by custom tools. It allows me to write extensions and automation.

    - Nothing should be possible in a GUI that is not possible from the command line. The inverse holds true as well. Everything should be automation friendly. Early versions of PVCS pissed me off for this reason. As a SCM manager, I've used both, and I'll take a command line over a GUI any day. My novice users want a GUI, my advanced ones usually revert back to command lines (and integrate it with their editors).

    - There must be readable 2 and 3 way diffs.

    - A good SCM tool will be able to produce reports, or at least make it possible to export information that can produce reports.

    - A good SCM tool should know how to handle binary files efficiently, rather than just storing the whole copy.

    - A good SCM system should not put a limitation on comments.

    - A good version control system should not try to "do it all" (CCC/Harvest) and do none of it well. When GUI's pop up off screen, or you have to artificially create packages for simple files, something's wrong. Which leads into...

    SCM systems should operate the way the users of that system do.

    There is a BIG difference between how commercial houses run things verses OpenSource projects.

    Commercial groups usually have a smaller set of developers, they are known in advance, and commonly use the locking model. OpenSource models tend to use concurrency a lot more, and operate on the applying diff's procedure. (Yes, I know, exceptions are out there.)

    Thus, some tools that feel more natural in some environments get quickly rejected in others. I've yet to see someone produce a readable guide about version control abstracted at a high level bringing all the terminology together. (Incidentally, I'm about to release one; email me for a draft.)

    The overall problem in tends to be that people look on the side of the box for features, rather than asking if the features are even applicable for what they're doing.

    Worse yet, proper SCM often gets sidestepped in commercial world. Ask: Do you want branching? You get, is it a feature?...yes! Now ask: Do you know when it's appropriate to branch, how to do the branch efficiently, how to graft branches back to the root, or how to physically do it... and you find out this is where a lot of bad CM happens. It isn't fun to inherit a screwed up repository.

    The most common downfall of SCM, as I've seen in the commercial world, is a failure of the those running it (quite often over-tasked infrastructure people) failing to understand the product being built with the tool, failure by team leads to communicate repository structure, failure by management as they use the SCM tool as a substitute for communication, and failure by the developers who don't know how to use the tool and when to use the appropriate features.
  • Uggghhh.... [OT] (Score:4, Insightful)

    by ryanvm ( 247662 ) on Tuesday February 05, 2002 @08:40PM (#2959294)
    I am getting soooo tired of this notion:
    Arch also poses its own answer to the 'Linus Doesn't Scale' problem.

    Look people, the "Linus doesn't scale" issue is NOT something that can be solved by replacing the use of 'patch'. Putting the Linux kernel on CVS (or Arch or whatever) would just allow people to commit stupid changes.

    The reason Linus doesn't scale is not because he doesn't have enough time to run 'patch'. It's because changes to the kernel MUST be approved.
  • by cduffy ( 652 ) <charles+slashdot@dyfis.net> on Tuesday February 05, 2002 @09:45PM (#2959545)
    If so, you've noticed that when you choose to merge data from branch (A) into branch (B) [no, it *doesn't* happen automatically unless you want it to!], then you have *control* over what parts of A go into B. You may have noticed that you can ask for the differences between A and B, and go through them by hand, and accept only specific parts -- just as someone doing patching does.

    No revision control system tries to replace good maintainership -- rather, their job is to make it easier.
  • by srussell ( 39342 ) on Tuesday February 05, 2002 @11:39PM (#2959920) Homepage Journal
    I'm not addressing Subversion vs. Arch, but rather Tom's evaluation of Subversion, which isn't entirely accurate.

    I'd also like to say, up front, to the Anonymous poster who asked:

    Anyone know a good system of incoroprating source control with a databases? Oracle and Postgres would do.

    Subversion does. The backend it currently uses is Berkeley DB, but the backend is pluggable. After version 1.0 comes out, expect to see a backend for one of the SQL databases pop up.

    Now, on to Tom's comparison to Subversion. Caveat: I am not a Subversion guru. I lurk in the developer mailing list, and I use Subversion myself. Therefore, I may make mistakes about details, but I'm fairly certain I won't provide completely bogus information. I got some reviews on this post from the Subversion dev list, including some comments from Tom, but any mistakes in here are my own, and they're copyrighted mistakes, dammit.

    I'm not going to quote whole sections; just enough for context.
    1. Smart Servers vs. Smart Clients. Subversion clients are also smart, although perhaps not as smart as Arch. Diffs travel in both directions, so a minimum of network traffic is used. Many Subversion operations (status, diffs against the last revision, etc) are purely client-side opereations.
    2. Trees in a Database vs. Trees in a File Systems This is misleading. You *can* get stuff out of the Subversion database with the standard BDB tools, so Subversion isn't required. Also, because Subversion is based on WebDAV, access to the database through a web server is a freebee; also, Subversion is very Windows friendly, from many points of view, which should help its adoption in a corporate setting. Subversion only stores the differences between two versions of a file or directory, which is space efficient. The advantage to being able to access a filesystem-based repository of diffs is arguable.
    3. Centralized Control vs. Open Source Best Practices In practical application, there is no advantage to the ARCH system over Subversion. Subversion allows per-file/directory sourcing, so you could create a project that includes sources from any number of different repositories. (This code is not currently working in Subversion.)
    These are simple mistakes. There is also one statement that is wrong: arch is better able to recover from server disasters The argument was that, because arch is a dumb FS, it is easily mirrored. The implication is that databases aren't easily mirrored. BDB is just as easily mirrored, and most other databases are easily replicated.

    Other comments pointed out were:

    • Subversion does not require Apache. It works over a local filesystem just fine. If you want network access, you need Apache.
    • Subversion has all of the strengths of Apache. You therefore get Apache access control (well defined and understood), SSL, client and server certificates, and interoperability with other WebDAV clients, among other things.
    • With Subversion, you have both client side and server side hooks, as well as smart diffs.
    • Arch has both revision libraries and repositories. The comparison document doesn't differentiate between them. In some cases, the comparisons made aren't meaningful. Revision libraries, for example "... also have to be created and maintained by the user. So comparing them to accessing past revisions through normal means in subversion is not a fair, or even really meaningful, comparison." (Daniel Berlin).
    • When comparing Arch's repositories to Subversion's there is no speed advantage. Arch's storage is either diffy (storing only differences), in which case it is not easily browsed and is no faster (at best) than Subversion; or the storage isn't diffy, in which case it isn't efficiently stored (imagine multiple copies of each file for each revision).
    • Subversion's choice of BDB as a backend was not accidental. Some of the tools Subversion got from using BDB are: Hot backup and replication, all kinds of existing tools that know about BDB databases (e.g. Python or Perl bindings). A body of - "community" knowledge. etc (Greg Stein).
    I've left out vaporware features, such as the future SQL backend of Subversion 2.0.
  • by klui ( 457783 ) on Tuesday February 05, 2002 @11:55PM (#2959970)
    I guess what you're suggesting stems from a different philosophy (Windows/classic Mac OS)--monolithic--than that of UNIX--writing tools that do one thing and do it well, while leveraging other tools on the system that do what they do well.

    I really don't care if this system is written using shell scripts, Java, or plain old C. Well, I do prefer just C, but that's my personal preference. I don't want the author to implement his own version of diff, check in, check out, etc. These subsystems are already available. Why reinvent the wheel again? If there are existing source repositories, it would be a pain to convert all the trees into a new proprietary format. RCS has worked well for so long, it would be a shame to throw all the histories away and start anew.

    Written properly, the shell, path, and environment dependencies shouldn't be a big problem although I have run into annoyances with environment space limitations under different UNIX OSes. But this particular environment space difference is taken care of by xargs(1).

    My largest concern is performance, but since most of the work is done with compiled code, it shouldn't be too bad, however I haven't looked at the source.
  • Misses the point (Score:1, Insightful)

    by Anonymous Coward on Wednesday February 06, 2002 @12:21AM (#2960059)
    As I understand it, the real issue right now is not shoveling bits around but figuring out which patches or changes or whatevers are worth using and which aren't. A super duper, multi-server bit shoveler won't fix that--anyone wanting to use the code base to make something useful will still have to know exactly which pieces of code to include and which to ignore. Barring the invention of true AI, that's a big burden on any software project that isn't going away.
  • Stop complaining (Score:3, Insightful)

    by chewy ( 38468 ) on Wednesday February 06, 2002 @03:00AM (#2960408) Homepage
    Hello there, I'm reading these Slashdot comments, and start realizing very slowly that people are missing the point of our software community hopelessly. People find stuff to complain about, even the sh dependancy! Well, AFAIK, the reason I love GPL software is because, if there is something bothering me about it, I can CHANGE it. That's right, boys and girls, you can take those sh scripts, and write some proper C code from that. From the original developer's point of view, he just wanted a system up *that worked* ASAP, using whichever tools he can to get it that way. Now that it's in the wild and known, it can be refined and perfected and fixed and whatever else and we can have a beautiful piece of software like CVS or Linux or Apache or whatever in the end (not that most of them will ever meet their end :)

    NOW is the time to stop complaining and getting those hands dirty and taking those things that bother you about the very first implementation, and go make some code. I see those sh scripts as nothing but prototyping code, and changing prototype code into C code is one of the easiest tasks a programmer can ever get to do (since the THINKING has already been done for you).

    So please everybody, take this brilliant idea and let's make ourselves another open-source success.

    ciao

Genetics explains why you look like your father, and if you don't, why you should.

Working...