All DVCS suck

Well, I've been migrating my development server to a new hoster (actually, they aren't new, I've used them for years for my other stuff, and been very pleased with them). In the process of the move, I've been cleaning things up and re-engineering things somewhat to solve some of the problems I've had traditionally.

One of the things that does keep coming up is a question as to whether or not I should continue to use Subversion for my online code repo. So, I've been looking at other alternatives, especially DVCSes, trying to see whether I would get any real benefit from them, or just be more burdened.

What's my conclusion? My conclusion is that they all suck, and maybe I just need to stick with SVN. Read on for the details...

SVN's elegance

First of all, let me tell you why I've been using SVN all these years. There's many reasons, but the key ones are as follows:

  • Simple repository maintenance: I know that the brass ring of DVCS is the whole "distributed" model, where you have no central canonical repo, but you know what? That's stupid. If you're ever in a situation where you don't have a central canonical repo you've done something wrong. Period. In the vast majority of projects you'll ever work on, there will always be some central place where the changes from the myriad of developers working on the project will eventually need to publish their changes to. And for repository creation and maintenance, SVN is supremely simple to use.
  • Easy and low ceremony commands: SVN is insanely easy to use. The commands are all very well documented, it has a helpful command-line help system, and there's no mystical magic that's needed in order to use it effectively.
  • Very easy to add new contributors to: This is perhaps SVN's biggest strength. By leveraging against existing technologies like Apache, giving someone write access to the repo is amazingly simple. You can give oodles of people write access to your SVN by simply setting them up with an entry in an htpasswd file. This is great when you, you know, don't want to fucking have to give every contributor shell access on the machine that hosts the repo.

The need for DVCS

Where SVN starts not working well is when you start needing to have more of a distributed model than even SVN can allow for. For example, when you need to have multple repos that aren't canonical, but rather will eventually feed the central, canonical repo. Or when you want to branch, merge, etc on a local repo when you're offline.

Because of the various hyped up concepts of DVCS, it starts becoming more and more appealing when you're looking for something more than what SVN can offer.

The question then becomes which DVCS to use. That's where the complication comes in, and that's where you can easily find a myriad of irritations.

So let's look at some of the DVCS offerings. I wont be showing you what their features are, as they all seem to offer essentially the same things as far as general project management needs are concerned. Instead, I'll point out their irritations.

BTW, I'm doing this largely as a personal exercise to try and help me figure out what I'm going to do: Whether I keep SVN or use one of the alternatives. Even though I may sound very irritated by each of these DVCSes (and actually am very irritated by them) I still haven't decided what I'm going to do. Basically, I want to weigh them all and try to see exactly which will irritate me the least (which tends to be how I decide things anyway since everything irritates me).

bzr

Let's start with Bazaar or bzr. This is the DVCS made and used by Canonical in their development on Ubuntu.

bzr's biggest problems are as follows:

  • Not easy to grant users write access to a central repo: GAH! What the fuck?! Why do I have to grant everyone shell access to my repo server just so that people can fucking push to the repo? And if I don't grant people access, they have to push bundles to me (or to someone I trust with shell access). Well, I'm sorry, but that's unacceptable. I can't be the bottleneck for some project (and I don't want to have others be the bottleneck either). Sure it's great that I can just "push" bzr repos by copying the files over to a dumb HTTP server, but that's a small boon for such a huge fucking loss.
  • Confusing web-interface that you can't actually pull from: If you do use the smart server, the web-interface is insanely ass-backwards. Not only that, but you can't fucking pull from it. What the hell?
  • Complicated push mechanism: Okay, so it's great that I can work locally... that's Jim-Fucking-Dandy. But what about when I need to, you know, actually publish my work somewhere so others can get at it? Well, there's no easy way to do it. Every method winds up using multiple additional layers of complication just to get the data online somewhere. This is horrid and unacceptable.
  • Fucking cache: Unless you set things up very carefully, any remote repos you push to will wind up being empty with the repo contents in an incomprehensible cache. Now, you can easily get the data back out with bzr, but what if bzr somehow breaks or goes bye-bye? Well then you're fucked. You can't get that data back out easily.

git

git is the thing that the kernel folks use. It honestly has some pretty nifty features, and I probably would lean towards it if it didn't have the following gigantic, hairy, man-titty problems:

  • Insanely confusing user interface: GAH! I don't need another 9 million fucking commands to learn just in order to use my DVCS! I know they have a central command-line wrapper thing that is supposed to make things easier to use. Unfortunately, it seems to be buggy at best, and broken at worst.
  • No native web-interace: I need to be able to publish the work I do in a fashion that is browsable from the web, and in this day and age, any DVCS that requires some external program to put its repo on the web feels dated. This makes git feel like a dinosaur compared to SVN, and that's not a good thing.
  • Buggier than sin: At Progeny we had a distribution development toolkit built around git, and I used that fucker extensively. However, there were at a minimum at least one problem per week we had to deal with due to problems in git. Some of these were so bad that we had to lose vast sections of a project's history just to be able to get the current state of the repo back.
  • No easy way to grant users write access: No surprise here, but git suffers from the same problems as bzr. Granting a user write access to a central git repo means giving them some sort of shell or rsync access to the server the repo runs on. It wasn't acceptable with bzr, and it's even less acceptable with git considering how buggy it is.
  • Everything is a fucking hash: Just like bzr, remote repos aren't stored in a way you can just browse to on the filesystem. Everything in remote git repos is a fucking hash. If git goes away, you can say goodbye to your data.

Hg

Hg is fucking clever. Hg, Mercury, get it? Get it?! Bah... Unfortunately, all the cleverness for Hg was spent coming up with the fucking name...

  • Incomprehensible commands: Hg tries to be better at commands than git, and it mostly succeeds. However, by mixing and matching option/verb/action combinations you'll wind up spending a lot of time typing your commands/options in the wrong order and getting errors. GAH! Pick a method and use it consistently!
  • Holy fuck! More hashes! What the hell is up with hashes these days?! Why can't you just fucking store the current state of the repo in a readable fashion so that I can get at it without using your fucking tool?! GAH!

svk

svk is essentially Subversion with DVCS stuff wedged on top. It uses the SVN filesystem, but then provides the more common DVCS functionalities on top of that. This has a huge benefit of being compatible with SVN clients, but loses some of the functionality you expect from other DVCSes.

So svk problems...

  • SVK and SVN can confuse eachother: The idea of downwardly compatible with SVN is so spiffy it gives me wood just thinking about it. The problem? It doesn't work. Rather it usually works but manages to fail when you really need it the most. I've had many cases where SVK can confuse SVN and visa versa resulting in a hosed repo. Not cool.
  • No SVN:Externals: Last time I used SVK, it didn't support svn:externals. Considering how useful the svn:externals property is when doing real-DVCS work lacking it makes you have to jump through more hoops just to get something done that stock-svn can do simply.
  • When SVK breaks, it really breaks: SVK seems to suffer from the git wedge problem in that when things go south, they really go south.

Many choices, none good

I know there's other DVCSes out there, but the ones I looked at all had enough problems that they didn't even make it onto this list of things for me to bitch at. In the end, I really think it will wind up being one of the above for me.

Right now, I'm leaning strongly towards SVN, bzr and hg. git and svk's wedge problems are simply to onerous for me to consider them seriously.

I will say that Hg looks to have the least problems, but the fact that it has less-than-stellar support in Debian makes me worried.

Anyway... stay tuned.

Response from a git user and hg/bzr advocate

Firstly, I still don't understand why people jump so much on the number of commands required to use git. It has a command line API, of course it has lots of commands. This means that you don't have to write code to write integration scripts with it. As for it being buggy, well I won't argue with that comment - that's your call to judge. Personally I find it a lot less buggy than SVN. Especially the SVN API, which always seems to be an exercise in tracking down segfaults.

Web interface? I presume that you're comparing to mod_svn_dav. That always pissed me off, because A) Web-DAV was a backwards incompatible extension to the HTTP protocol requiring Aboche 2, and B) there is no way to see historical versions on an http SVN path without using the svn commandline or the API. SVN uses Web-DAV "custom reports" for everything, which means that it doesn't work with standard Web-DAV tools. By comparison, sharing a git repository via HTTP/Web-DAV does not require any special server module. Let's face it - gitweb, as others have pointed out has been bundled with git for ages, is astoundingly better than SVN's "native" http interface, and much better than anything I've seen for SVN - including ViewSVN, trac, etc.

As for the "buggier than sin" bit, well, you can blame the tools for losing the history if you like, but git is really the only tool with the content-hashing filesystem idea which makes it really difficult to actually ever lose anything. So you managed to write buggy wrappers for git? So what.

You seem to also be unaware of the git-shell, a secure login shell that you can give to the accounts of people you want to give write access to. It re-uses existing authentication systems - SSH! And of course there is writable HTTP/WebDAV, which may be appropriate for some. But arguing that point misses the most vital point of DSCM which seems to have flown completely past you - that it is much easier to allow extra committers because you don't have to. For instance, anyone can go to the repo.or.cz site and set up a "fork" of a project, and upload to there. Their fork can be easily integrated because unlike SVN, git's design doesn't suck.

As for the "Everything is a fucking hash" - well, if you look at the git-rev-parse man page you'll see there are lots of ways to write revision numbers. In Mercurial and bzr, too. This is just a caveat of removing useless surrogate revision identifiers. And besides, a hash is much simpler than the trio of (URL, REVNUM, UUID).

Many of my points also apply to Mercurial and bzr. SVK I would call only quasi-distributed, as it does not support arbitrary graphs of connectedness like the other ones do and still uses surrogate revision identifiers.

Response to your response from an Hg convert

Heh, well in the time since I wrote this rant, I've actually become a pretty big Hg fan. My original points about Hg remain, but whatever.

I think the summary of my post is that all DVCSes suck... oh wait... that was the original title. Anyway, it's like the motto for mutt "All mail readers suck, mutt just sucks less".. the same can be applied to whatever DVCS someone is enamored to at the moment.


Firstly, I still don't understand why people jump so much on the number of commands required to use git. It has a command line API, of course it has lots of commands. This means that you don't have to write code to write integration scripts with it.

Command-line API is an admirable goal, but for the longest time this "API" was in a constant state of flux. It was nigh unusable for a very long time. And expecting the end-user to actually navigate this tree of mess is unacceptable. Add to this the fact that whenever anyone bitches about the bajillion commands some git zealot rises up and calls everyone who's complaining an idiot and it makes it very hard to love git unless you're already indoctrinated. (Oh, snap, did I just call Linus a zealot? A thousand Hail Mary's :-)

Now, I will say this... git has improved... a lot. There is now a much more sane shell that provides a unified place for all this stuff, the API has firmed up, and there's lots of new stuff that make git more "friendly" to the unindoctrinated.

Web interface? I presume that you're comparing to mod_svn_dav. That always pissed me off, because A) Web-DAV was a backwards incompatible extension to the HTTP protocol requiring Aboche 2, and B) there is no way to see historical versions on an http SVN path without using the svn commandline or the API

Uh... I actually agree with you on point "B". But you're missing my original point about gitweb, you can't get at the files without using the command line git API. The very same argument you say against SVN can be turned around with gitweb. The really nice thing about the classic SVN web-interface was that it could present itself as a directory of files and anyone could get at the files without relying on some additional tool. Granted, they were only the most recent files and you couldn't get any history whatsoever, but being able to get at the files without having to jump through hoops is huge.

But arguing that point misses the most vital point of DSCM which seems to have flown completely past you - that it is much easier to allow extra committers because you don't have to.

And yet, by this very statement, you've missed my entire point. My original point was that git significantly raises the difficulty level over classic SVN for project collaboration. Not just the confusing commands, but because if someone doesn't have git, and doesn't know about git, they simply could not download files from the repo, make some changes, and then send a patch to someone. git (and really, most modern DVCSes) basically places the requirement that if you want to contribute code to a project, you have to go through it. That may work fine in a community like the Linux kernel, or in big language projects, but when you're dealing with something that is more cross-platform and involves contributors of all skillsets and backgrounds, that's a very bad thing. For example, back when I ran T4K, had I standardized on something like git, it would have easily eliminated half of our contributors.

At the end of the day the one thing that all DVCS/DSCM/whatever zealots need to learn is that their app isn't the end-all-and-be-all solution for everyone's problem. Furthermore, the fact that SVN is still in use so much today means that it at least is compatible enough with the way most developers work in order to satisfy their needs. Thus getting on your high-horse and bad-mouthing SVN while trying to make people use your DVCS/DSCM/whatever flavor of the month is just showing how much you don't understand the problem is subjective, and the solutions relative.

Now, I will add this: Things have changed since I wrote the above. First of all, I gave up SVN a while ago... I was pissed off at SVN for most of the same reasons everyone else has been. Really, I had only been using it for as long as I had because every alternative had problems that made them not viable for me.

Second of all, I've now standardized my personal system on Hg. I love Hg, and it seems to give me just about everything I need. I'll admit I do pine for some of git's more clever and abstract features (*drool* rebase *drool*, and no, hg transplant is subtly different), but I'll also admit I'll probably never really need them anyway. I still get pissed off that everything is a hash so I'm forced to use Hg in order to grep older revisions.... but that's no better or worse than SVN's binary bullshit (for this, I'll probably just have to wait until enough people realize how important it is to be able to get at your data without using the tool).

Third of all, having used BZR at my job for nearly a year now, I have to say I'd probably kill to switch to git. Repositories of any decent size are unbearably slow to work with in BZR. Couple this with the fact that it's nearly impossible to get a working queue manager with BZR and you can see how painful it is. My personal choice would be to rip BZR out and slap in Hg... but seeing as how my employer employs Linus as well the more likely candidate would be git.

Every time the BZR repo takes 30 minutes for a 'bzr up' to run I wish that I was using git or Hg instead :-)

Anyway... enough ranting.

Fellow person named Sam, I love your site (Drupal forever :-) ... especially this post. Funny stuff :-)

git

Insanely confusing user interface: true. It is worked on.

No native web-interace: false. gitweb (git web interface in Perl) is bundled since git 1.4.0

Buggier than sin: unconfirmed, perhaps PEBCAK. Haven't seen your bugreports on git mailing list. It is very hard to lose history in git. Buggy scripts?

No easy way to grant users write access: restricted shell (git-shell) or WebDAV, or enable pushing via git protocol (unauthenticated).

Remote repos aren't stored in a way you can just browse to on the filesystem. Live with it, unless you want to have for repo to take more that 10 times space. (And there is always gitfs).

git git

Insanely confusing user interface: true. It is worked on.

It is... now. But for the longest time the modus operandi whenever anyone bitched about it was to tell them to quit complaining.

Buggier than sin: unconfirmed, perhaps PEBCAK. Haven't seen your bugreports on git mailing list. It is very hard to lose history in git. Buggy scripts?

They weren't my bugreports... I was a user of the tool, not one of the developers. Our developer at the time said he was active on the list and was submitting bug reports, but it wasn't my job to babysit him and ensure it was happening so I just assume they were.

But the truth is, yes, we did have tons of problems with git. I wouldn't say our version history was ever lost in its entirety, but I will say we had situations where an action (usually a commit or merge) would wedge the repository in such a way that we had to do some fancy git CLI API moves to rescue it from. This usually resulted in one or two of the most recent commits lost, and on more than one occasion actually shut a paid project down for a week or more while the problem and solution was been researched.

Oddly enough, I did find an image of one such problem we had. This image was actually used to try and illustrate how SVN wouldn't have been able to handle a sequence of complicated merges (which is why there's the crudely drawn text on the image), however the "Remove package selection which caused bomb at..." entry is exactly what I described above:

There, we had a number of commits to a branch that had no other contributors. I was the only person working on it at this point in the project. So, there shouldn't have had to have been any merges. But then, sometime around the "Add stock CentOS comps.xml" entry, something very bad happened. The repository got wedged, but git happily continued to accept commits from me. Finally, around the first merge you see on the list, it died entirely and every git command resulted in errors (don't ask me what errors, this was around two years ago, and the project was internal to a company that is now out of business :-)

I recall us spending nearly a week (I remember that vividly because I had some vacation coming up and was stressed that we wouldn't resolve this problem for our customer before I left) trying to find the problem. The complicated tree you see on what should have been a flat history is the only indication I have right now as to what our final solution was (like I said, it's been around two years, and I don't remember exactly what we had to do to fix this).

In all honesty, I actually suspect that all of our problems with git at the time were because it wasn't quite ready for Prime Time. It was in too much of a state of flux and was undergoing too many changes to rely on it working the same from release to release, and to rely on it being bug free. That being said, at the time, we were assured by various git people that it was ready for Prime Time, and that we should be using it.

From what I've seen lately, git has come a long ways since then.

hg

Okay, so I think I may be going the Mercurial route. Unfortunately, trying to convert from SVN to HG has been more than difficult.

What I think finally worked was hgsvn which thankfully was in Debian.

I had high hopes for Tailor, especially since it was in Debian, but the damned thing gave me nothing but errors.

Re: hg

Alright, well I think I've got it mostly set up, and I must say I like the results thus far.

What I wanted was something like SVN in that I can easily push/pull changes from upstream, and it looked like of all the ones I tested out, Hg was the only one that could do that without significant customization.

That being said, there was some configuration that had to be done. And unfortunately finding the information on how to do this was moderately difficult simply because the Hg documentation seems hard to navigate (and searching is less than stellar... likely a result of them using Moin Moin as their wiki...)

Anyway... I'm going to make a new post detailing how I did what I did simply because I'd like to help someone out in the future who's trying to do the same thing.

Conversion to Hg

Read it here.