As I said in my last entry, I've been evaluating the various modern DVCSes to try and figure out which of them would give me the most benefit, while at the same time irritate me the least.
I've been using Subversion (SVN) for a few years now on my dev servers (formerly, svn.samhart.net and friends) and have mostly been pleased with it. In fact, the only reason I even considered replacing SVN was because there were certain aspects of DVCS that I felt could make my life easier, namely the ability to have a repo's entire history available locally and the fact that offline work can be done so much easier with them.
Additionally, I've been working with a lot of modern DVCSes lately (namely bzr, git and svk) and I've been very displeased by each of them. They all had at least one critical problem that, for me, made them impractical to even consider for use in my own repos. The end result is that I've spent a lot of time frustratingly researching and testing as many DVCSes as I could to try and figure out if I should switch or just stick with SVN.
But, after the smoke cleared and the fires died down, I discovered that one DVCS, Mercurial (Hg) was left standing on equal ground with SVN in the "has to not irritate me" department.
The problem? Conversion from SVN to Hg isn't as straightforward as one would like. Thus, I'm documenting the steps I had to do to try and help out anyone else who's attempting to go down this path.
For what it's worth, I don't plan on discussing what pro's and con's are involved with each of the DVCSes here. At the end of the day they each have comparable feature-sets and functionalities, and any choice as to which DVCS a person will use will likely be a very personal one (or at least one dictated by someone charge :-) Thus, I am not going to argue the benefits of Hg over any of the others, or even over SVN. I'm merely going to show how you can convert your existing SVN repos into Hg repos, as well as set up Hg to be allow for easy SVN-like pushes/pulls on your server.
I should mention a what I have been running, as well as what I will be running. I do this only because I know there's a myriad of ways to set up SVN and Hg, and unless you're doing what I'm doing, my notes wont help you much.
Traditionally, I ran SVN using WebDAV in Apache2.x. I wanted to continue to run Hg using Apache2.x (as this server has other needs for Apache2.x), but I no longer needed WebDAV for Hg. I'm also running Debian (with a mix of packages from stable, testing, and unstable).
Every tool that I mention in this guide can currently be found in Debain, their package names are:
Naturally, you can get these things up and running in other *nixes, but I'll leave that up to you to figure out if you decide to follow my guide.
This was perhaps the trickiest part of the process for me. This was because there's a plethora of tools for doing SVN to Hg conversions, but most of them don't seem to work well. I first tried yasvn2hg, but I couldn't get the damned script to even run. Next I tried Tailor which promised to be the Swiss Army Knife of repo conversion utilities. However, I had hours of headache and no progress using it. Finally, I tried hgsvn, and it worked like a charm.
hgsvn is apt-gettable in Debian. However hgsvn needs the functionality from python-setuptools, but its package does not require it. This means that, unless you already have python-setuptools installed for something else, chances are you will see this error when you install hgsvn and try to run it:
$ hgimportsvn http://url.to.repo/repo Traceback (most recent call last): File "/usr/bin/hgimportsvn", line 5, infrom pkg_resources import load_entry_point ImportError: No module named pkg_resources
If you get this error, simply install the python-setuptools package (or equivalent) and try again.
Once hgsvn and its needed libraries are installed on your system, the basic method to convert a repository is as follows:
$ hgimportsvn http://url.to.repo/repo ...^^^Sets up the import $ cd repo ...^^^Changes to the freshly created subdir $ hgpullsvn ...^^^Pulls down all the changes from svn and creates an hg history $ hg update (optional)
Once you've done these steps, your repo will have been converted to Hg. This works well for single repositories, but what if you have something more complicated?
If you're like me, when you originally set up SVN you did so in the laziest way possible.
Setting up SVN repos is more work than it should be. It involves using commands that you normally never have to touch (svnadmin), setting up new entries for those repos in your http server's configuration files (if you're using Apache and WebDAV), and setting up user permissions to those repos. Thus, the lazy way to set them up is to make one central SVN repo under which you have multiple sub-repos. This has the advantage of making your repository very easy to maintain. However has a big disadvantage in that a user with write access to any sub-repo will have write access to the entire repo.
In Hg, on the other hand, setting up a new repository is much easier, and maintaining multiple repositories more manageable. So, if you're like me, you may be tempted to remedy past sins by splitting your single gargantuan SVN repo into smaller Hg repos. Thankfully, hgsvn makes this very easy.
Let's say that you have one core SVN repo, called "main" which has the following sub-directories which you are treating as sub-repos:
main/ projecta/ projectb/ projectc/
hgsvn can actually handle sub-directories of SVN repos and generate histories of just those sub-directories, effectively splitting the directories into repos of their own. It will even keep track of changes that only affect the individual sub-repo (meaning parent or neighbor changes don't get entered, unless they were otherwise combined in the original SVN).
A method for splitting the above could be:
$ hgimportsvn http://url.to.repo/main/projecta/ ...^^^Start with "projecta/" $ cd projecta $ hgpullsvn ...^^^Pull the history for "projecta/" $ hg update ... $ cd .. $ hgimportsvn http://url.to.repo/main/projectb/ ...^^^Move on to "projectb/" $ cd projectb $ hgpullsvn ...^^^Pull the history for "projectb/" $ hg update ... etc.
When you're done using the hgimportsvn and hgpullsvn tools, you will have repos in a strange half-SVN/half-Hg form. They will be legitimate Hg repos, but they will still have the .svn directories strewn throughout them, and have some .hgignore files telling Hg to ignore said .svn directories. So, if we're going to go 100% Hg, we may as well get rid of this stuff.
$ cd repo/ (whatever the path is to your hgsvn made repo) $ find . -name .svn | xargs rm -fr ...^^^Get rid of the .svn/ directories $ find . -name .hgignore | xargs rm -fr ...^^^Get rid of the .hgignore entries
As I said before, I used SVN via WebDAV and it was pretty painless once I got it going. Thus, I want Hg to behave exactly the same way when I do my pushes and pulls. Hg doesn't use WebDAV (at least, if it does, I didn't look deep enough into the documentation to figure out how to set it up), but it does come with a handy CGI script for giving you the same basic functionality.
If you're only running one repo, there is a script called hgweb.cgi which is easy to configure and will handle your needs. However, since I run multiple repos, I decided to use another script called hgwebdir.cgi that serves up multiple Hg repos in one web-interface.
hgwebdir.cgi takes an external configuration file that defines the repos it will monitor. There are two ways you can configure this file for the repos.
The first is to use the [collections] directive which auto-magickally determines all of your Hg repos based upon some common root directory. For example, let's say that all your repos are under /var/repos:
/var/repos
projecta/
projectb/
projectc/foo
You would then place the following in your hgwebdir.cgi configuration file if you wanted to use the [collections] directive:
[collections] /var/repos = /var/repos
This configuration file would make your repos available online as "projecta", "projectb" and "projectc/foo".
However, if your repos are not under some common directory, or if maybe there's other items that aren't repos alongside your repos, then you can use the [paths] directive to itemize each one:
[paths] projecta = /home/fred/hg/projecta/ projectb = /var/repo/
Whichever you do, save the file (the name doesn't matter, I just used hgweb.config) and edit the line in hgwebdir.cgi to point to this newly created configuration file. For example:
def make_web_app():
return hgwebdir("/etc/hgweb/hgweb.config")
Technically speaking, you're already set. Just stick the hgwebdir.cgi file someplace where CGI scripts can be executed and point your browser at it. However, at this point you can't push repository changes via this web interface. Additionally, I kind of wanted the URLs to look cleaner.
You may be fine handing out repository URLs like http://someurl.com/cgi-bin/hgwebdir.cgi?mf=b22511d1eb56;path=/, but I'm not. I want my repository to have URLs that are clean as possible. So, I make sure mod_rewrite is enabled in my server (a2enmod rewrite, if you're running Apache2) and add the following to my Apache2 configuration entry on my hgwebdir.cgi:
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteRule ^/(.*) /hgwebdir.cgi/$1
</IfModule>
Next up, I want users to have the ability to view and clone the repository anonymously, but need to be authenticated in order to push back to the server. Additionally, I want a central place for the htpasswd file (you could do this on a per-project basis, but I'll explain why you don't want to in a bit). So, I add the following to my Apache2 configuration entry on my hgwebdir.cgi:
AuthUserFile /etc/hg/htpasswd
AuthName "Dev Repo"
AuthType Basic
<Limit POST PUT>
Require valid-user
</Limit>
The <Limit> segment is the magic that allows us to have anonymous access to the repository but in order to push you must be authenticated.
Note that in my example here we're using "AuthType Basic", which is probably not the best way to do it. However, it is the most simple way to show for this example. I leave it to the reader to figure out how to use another AuthType (or, make pushes go across SSL).
The final thing we need to do is make it so that the hgwebdir.cgi script is the index when the server attempts to serve up the page and to make sure the server can handle CGI.
DirectoryIndex hgwebdir.cgi
AddHandler cgi-script .cgi
Options ExecCGI
Order allow,deny
Allow from all
If we put it all together and assign it to a virtual host, we get an entry like the following (which, if you're using Apache2 can just be placed as a file in sites-available):
<VirtualHost XXX.XXX.XXX.XXX:80>
ServerName hg.someplace.com
DocumentRoot /var/hg/hgweb
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteRule ^/(.*) /hgwebdir.cgi/$1
</IfModule>
<Directory /var/hg/hgweb>
DirectoryIndex hgwebdir.cgi
AddHandler cgi-script .cgi
Options ExecCGI
Order allow,deny
Allow from all
AuthUserFile /etc/hg/htpasswd
AuthName "Dev Repo"
AuthType Basic
<Limit POST PUT>
Require valid-user
</Limit>
</Directory>
</VirtualHost>
The hgrc file is the general configuration file for all things Mercurial. There are always at least two possible hgrc files for every repository:
Inside of these hgrcs, you can define a directive called [web] which controls the behavior of the web-interface used in hgweb.cgi and hgwebdir.cgi.
Hg's web-interface defaults using a style that I personally find to be ugly and confusing to use. I much prefer the "gitweb" style over the default Mercurial style. So I set the "style" parameter in the [web] section of the system-wide hgrc to "gitweb" to make it the default style.
Additionally, I want compressed archives to be made available, and I want to set a system-wide contact. Finally, if you're using the same setup I've detailed above, you aren't using SSL for your pushes, which means that the push over SSL requirement should be disabled.
[web] style = gitweb allow_archive = bz2 gz zip contact = Myself, me@somewhere.com push_ssl = false
For each repo, you can define a specific hgrc file that will override the system-wide settings from /etc/mercurial/hgrc.
Generally speaking, you want to at least define a description for the repository as well as who is allowed to push. Additionally, you can define new contact information if it differs from the system-wide setting.
[web] description = An addressbook for keeping track of your "friends" contact = Ted Haggard, tedh@ilikethemens.com allow_push = tedh
Now, you can easily define per repository htpasswd files, however, this can get unwieldy and is completely unnecessary. Instead, it makes more sense to define a global htpasswd file, but then define push rights per repository in the hgrc.
So I could have a global htpasswd file that defines all of my users like this
tedh:HGand8176 fred:87JIkn7j1*9 joe:87/joiqKl91 jake:jasmn1%1tba
But then define the following project push rights via their hgrc's:
Project A
[web] descrtiption = Project A allow_push = tedh, joe
Project B
[web] descrtiption = Project B allow_push = jake, fred
Project C
[web] descrtiption = Project C allow_push = tedh, jake, joe
When it's all said and done, you should have a working Hg repository server and should be able to pull/push from/to it.
However, there were some small issues that I ran into that I should note simply because they seemed to be a bit tricky.
The first problem I ran into was when I tried to get a friend of mine online to try out the new repository for our IRC bots code. When he tried to clone the repo, he got the following error:
[17:02] < schultmc> | $ hg clone http://dev.samhart.net/bots/ [17:02] < schultmc> | destination directory: bots [17:02] < schultmc> | requesting all changes [17:02] < schultmc> | adding changesets [17:02] < schultmc> | adding manifests [17:02] < schultmc> | adding file changes [17:02] < schultmc> | abort: consistency error adding group! [17:02] < schultmc> | transaction abort! [17:02] < schultmc> | rollback completed
Additionally, when I tried to clone it, I would either get the same error he did, or get the following:
$ hg -v clone http://dev.samhart.net/bots destination directory: bots requesting all changes adding changesets adding manifests adding file changes abort: premature EOF reading chunk (got 6822 bytes, expected 34384) transaction abort! rollback completed
The strange thing was, other repos worked fine, and a "hg verify" on the server revealed them all to be in working order.
As it turns out, both of these problems have the same root cause: errors on the server. In my case the system was running out of resources, so adding a bit more swap solved the problem. However, it could also be things like permission problems or other misc. Apache errors.
At any rate, if you get errors that look like the above, chances are they are server errors and you should look very closely at what's going on during each attempted transaction.
This is just an ugly stacktrace, but doesn't seem to cause any problems. If you try to push and use the wrong username/password, you will get a stacktrace that looks a lot like the following:
** unknown exception encountered, details follow
** report bug details to http://www.selenic.com/mercurial/bts
** or mercurial@selenic.com
** Mercurial Distributed SCM (version 0.9.3)
Traceback (most recent call last):
File "/usr/bin/hg", line 12, in ?
commands.run()
File "/var/lib/python-support/python2.4/mercurial/commands.py", line 3000, in run
sys.exit(dispatch(sys.argv[1:]))
File "/var/lib/python-support/python2.4/mercurial/commands.py", line 3223, in dispatch
return d()
File "/var/lib/python-support/python2.4/mercurial/commands.py", line 3182, in
d = lambda: func(u, repo, *args, **cmdoptions)
File "/var/lib/python-support/python2.4/mercurial/commands.py", line 1971, in push
r = repo.push(other, opts['force'], revs=revs)
File "/var/lib/python-support/python2.4/hgext/mq.py", line 2025, in push
return super(mqrepo, self).push(remote, force, revs)
File "/var/lib/python-support/python2.4/mercurial/localrepo.py", line 1360, in push
return self.push_unbundle(remote, force, revs)
File "/var/lib/python-support/python2.4/mercurial/localrepo.py", line 1438, in push_unbundle
return remote.unbundle(cg, remote_heads, 'push')
File "/var/lib/python-support/python2.4/mercurial/httprepo.py", line 352, in unbundle
heads=' '.join(map(hex, heads)))
File "/var/lib/python-support/python2.4/mercurial/httprepo.py", line 235, in do_cmd
resp = urllib2.urlopen(urllib2.Request(cu, data, headers))
File "/usr/lib/python2.4/urllib2.py", line 130, in urlopen
return _opener.open(url, data)
File "/usr/lib/python2.4/urllib2.py", line 364, in open
response = meth(req, response)
File "/usr/lib/python2.4/urllib2.py", line 471, in http_response
response = self.parent.error(
File "/usr/lib/python2.4/urllib2.py", line 396, in error
result = self._call_chain(*args)
File "/usr/lib/python2.4/urllib2.py", line 337, in _call_chain
result = func(*args)
File "/usr/lib/python2.4/urllib2.py", line 741, in http_error_401
host, req, headers)
File "/usr/lib/python2.4/urllib2.py", line 720, in http_error_auth_reqed
return self.retry_http_basic_auth(host, req, realm)
File "/usr/lib/python2.4/urllib2.py", line 730, in retry_http_basic_auth
return self.parent.open(req)
File "/usr/lib/python2.4/urllib2.py", line 364, in open
response = meth(req, response)
File "/usr/lib/python2.4/urllib2.py", line 471, in http_response
response = self.parent.error(
File "/usr/lib/python2.4/urllib2.py", line 396, in error
result = self._call_chain(*args)
File "/usr/lib/python2.4/urllib2.py", line 337, in _call_chain
result = func(*args)
File "/usr/lib/python2.4/urllib2.py", line 916, in http_error_401
host, req, headers)
File "/usr/lib/python2.4/urllib2.py", line 807, in http_error_auth_reqed
raise ValueError("AbstractDigestAuthHandler doesn't know "
ValueError: AbstractDigestAuthHandler doesn't know about Basic
I've been searching the various bug databases involved (Debian's and Mercurial's) but haven't found this particular problem yet. Will likely be filing a bug in a bit.
Smattering of links that I used to figure all this out. In no particular order.