From CVS to Subversion

Thomas Guest reflects on migrating his organisation's version control system from CVS to Subversion.

Introduction

The time had come to upgrade our source control system. As CVS users, the obvious choice was Subversion. This article describes how the upgrade went and provides some practical advice for anyone considering making a similar move.

The reason for change

CVS is an excellent source control system: fast, powerful and flexible. We had no concerns regarding its reliability and some effort had been put into integrating it into our automated build, test and release system. What's more, everyone in the team knew how to use CVS and how to work around its wrinkles. We all had our favourite clients. Why ever would we want to change?

There were a number of reasons:

The team had grown and so had the codebase. The CVS server no longer served high volumes of files as quickly as we'd have liked.
As the codebase grew, it had become apparent that some files were in the wrong places or had the wrong names. CVS supports versioning of files but not of file-systems, meaning that we couldn't fix these issues in a controlled way. Subversion fixes this CVS limitation.
CVS does not support atomic commits (see Sidebar) - another feature built in to Subversion.
Subversion sets out to be a 'compelling replacement for CVS' and, after a quick skim through the documentation, it looked as though the transition would be painless.

Evaluation

We did pause - albeit briefly - to consider whether an alternative version control system might better meet our needs. We couldn't think of any. The decision was somewhat political since we'd recently been acquired and the parent company had its own preferred version control system. The move to Subversion could be passed off as an upgrade of our current system rather than a truly subversive act.

Our next step, then, was to evaluate Subversion. The aims of the evaluation were:

measure the performance of Subversion
build some expertise in Subversion administration
confirm Subversion's core capabilities consider how best to actually use Subversion
if all looked good, put a transition plan in place.

Automic Commits

Suppose a single logical change to the codebase - a bugfix perhaps - requires six files to be changed. The programmer makes the change and commits the new versions of the files to the CVS repository. Although a single commit command is executed, as far as CVS is concerned six changes have been made, and each individual file moves to its own new revision. If another programmer wishes to patch the bugfix to another code branch, all six files will need to be patched - but it's tricky to find this out from CVS. Information has been lost.

Subversion solves this problem with a simplified change model: version numbers apply to the repository as a whole, and each change is a transaction resulting in a new repository version which logically corresponds to a whole new file tree.

Clearly, the first step was to set up a Subversion server and import a snapshot of our CVS repository to practise on.

Setting up the trial server

Setting up the trial server was straightforward. On (Mandriva) Linux, after the usual package selection and update process we had an svn user ready to serve the repository, and an /etc/xinet.d/svnserve configuration file, the contents of which are shown in Listing 1.

Most of the contents of this configuration file should be easy to figure out. The actual program which will serve the repository is /user/bin/svnserve (run as a daemon by xinetd ) and it should be run by user svn with arguments -i (inetd mode) and -r /var/lib/svn/repositories (root of directory to serve).

Once we had created the repository (see next section) in the configured location, we enabled the Subversion server as follows:


  su                      # root runs xinetd
  chkconfig svnserve on   # enable svnserve service
  xinetd restart

  # default: off
  # description: svnserve is the server part
  # of Subversion.
  service svnserve
  {
    disable     = yes
    port        = 3690
    socket_type = stream
    protocol    = tcp
    wait        = no
    user        = svn
    server      = /usr/bin/svnserve
    server_args = -i -r /var/lib/svn/repositories
  }

Figure 1

Note here that although the root user starts the xinetd service, the svn user actually owns and serves the repository.

Server options

There are two main options for serving a Subversion repository:

using the custom svnserve server
using Apache httpd with mod_dav_svn

A full discussion of the pros and cons of these options can be found in the Subversion book: [ Subversion ].

This discussion is summarised in a table [ Subversion2 ].

We opted to use the custom svnserve server because, according to the documentation it would be easier to set up and somewhat faster.

Whilst I have no experience of using Apache as a Subversion server, I can certainly confirm that svnserve is simple to set up.

Importing a copy of the CVS repository

Creating the trial repository was a little more time consuming. To perform realistic tests we needed something like a full import of our CVS repository.

Subversion provides a Python program, cvs2svn , to perform this import. One thing you really shouldn't do is try to import a live CVS repository, which is of course a moving target. One thing we equally didn't want to do was take down the CVS server for any period of time. Fortunately, we kept a mirror of the live repository; by taking a copy of this mirror (when it wasn't being mirrored!), we gave ourselves something to import.

The documentation for cvs2svn is rather thin. In fact, at the time of writing this article, there's not a huge amount over and above what the command line tells you:

    cvs2svn --help

Fortunately, the program works. Most of the options aren't even required if you're happy to go with the default repository layout, default database backend, default keyword expansion mode, default end-of-line style and so on. And, if cvs2svn does hit problems - which will almost certainly be caused by 'Garbage In' - it exits smartly and tells you what to do next.

In our case, we had to clear up a little tag ambiguity, and then we were off. The import took about four hours - this is for a CVS repository consisting of about 64000 files, a few hundred tags and branches, and occupying around 12Gb on disk.

Choice of database layer

At revision 1.3, Subversion provides a couple of database backends:

a Berkeley DB database
FSFS, where data is stored in ordinary flat files, using a custom format.

The pros and cons of the two choices are discussed in the Subversion book [ Subversion3 ] and summarised in a table [ Subversion4 ].

Although the FSFS backend is less mature it looked more suitable in every other respect. The Subversion repository create tool, svnadmin create , treats FSFS as the default database backend - and so does cvs2svn . We decided to go with this default and have had no complaints.

More evaluation

To our surprise and disappointment, the speed of clean checkouts (by 'clean' I mean checking out the entire codebase into a new directory, rather than simply updating an existing working copy) was underwhelming. CVS sets a hard act to follow here since one of its strengths is its speed, but I simply couldn't imagine Subversion claiming to be a compelling replacement for CVS unless it was equally fast. In fact, head-to-head, on the same platform, our tests showed CVS to be measurably quicker for clean checkouts.

What the trials did indicate was that disk access rather than network bandwidth was the main source of pain. Every time it checks out a file, Subversion replicates the base version of that file and its properties ('Properties' is the Subversion term for metadata associated with a file - such as whether it's executable, for example.) into a hidden .svn directory, so for every 100 files you checkout, at least 500 files will be created on disk.

This replication is quite deliberate and is based on the principle that disk-space is cheaper than network bandwidth. Subversion makes full use of the cached file copies in your working area - so, for example, common operations such as viewing your modifications to a file, or reverting these modifications, do not require any interaction with the server.

What we found, then, on the performance side, was that the routine management of a working copy was much quicker. Clean checkouts took time, yes, but use of the svn update command keeps these to a minimum. In fact, the only user who frequently performed clean checkouts was our overnight automatic build.

Everything else went very well. Clearly, the authors of Subversion had done a great job in fixing the problems with CVS, and they'd done so - at least from a user's perspective - by simplifying it.

The transition plan

There never seems to be a good time to change tools. There will always be releases to make, builds to test, critical patches to issue, and it's understandably hard to justify even the smallest amount of downtime in such real, customer-facing activity. Indeed, when there's lots of this real work to be done it's equally hard to dedicate the time to take proper care when executing such a tool change.

On the other hand, by taking such an argument to the extreme, software developers end up stuck using Visual Studio 6.0 and grumbling (unfairly) about Microsoft's poor support for C++.

We were, then, keen to proceed. Our transition plan was simple. The timescale was short but manageable - and we knew that if we missed the slot, we wouldn't get another chance for a while.

As part of the evaluation, we'd created some backup scripts to mirror a Subversion repository to our backup machine. Next, we set up a migration script to:

disable scheduled jobs which might get in the way
take down the CVS server
copy the CVS repository
import this copy using cvs2svn
log any problems occurring in any of these steps.

So, to initiate the transition, all we had to do was schedule the migration script to run overnight. In the morning, if the log files were clean, we could kick off the Subversion backups, point the Subversion server at the newly imported repository and start it up, restart the CVS server in readonly mode, and we'd be done.

What we failed to do

A number of items on the evaluation and transition plan never happened. We didn't create any local training materials - it didn't seem necessary, given the high quality of Subversion's built in documentation, and the fact that we all knew CVS (see Sidebar: Subversion for CVS users). We ordered printed copies of Version Control with Subversion and Pragmatic Version Control Using Subversion , set up an FAQ page on the Wiki that did little more than collect together a few links to offsite URLs, and left it at that.

Subversion for CVS Users

If you're familiar with CVS then Subversion will also seem familiar. This is hardly surprising since Subversion's stated aim is to be a compelling replacement for CVS. So, the terminology is almost identical: you 'check out' files from a 'repository', you edit them, you 'diff' files to see what you - and others - have changed, and you 'check in' your changes. You 'update' your working copy to merge in changes made by other team members. You can 'log' what changes have been made. You can 'branch' a project by copying it from the 'trunk' into a new place in the repository; similarly you can 'tag' a fixed version of a project by copying it into a new place in the repository. You can 'merge' changes made on the branch back to the trunk, and vice versa.

If you use the command line client, svn , the command line arguments are often identical to the ones used with the cvs client ( svn commit , svn checkout , svn status , svn annotate svn diff , svn log , etc.).

The areas where CVS and Subversion differ are generally where Subversion fixes a CVS deficiency or where Subversion actually manages to simplify things. For example, as already mentioned, Subversion fixes a well-known CVS deficiency by allowing you to move files and directories; and Subversion's transactional model means that a version number (a revision number, in Subversion terminology) is an incrementing integer applied to the repository as a whole, which is easier to work with than the dot-separated version numbers which apply to each CVS controlled file.

More information for CVS users migrating to Subversion can be found at: http://svnbook.red-bean.com/en/1.2/svn.forcvs.html

Despite encouragement, no-one had bothered to use the trial repository as a sandbox for experimenting with Subversion (apart from the individual actually running the trial). So, the evaluation of the product's usability and basic functionality was down to just one person. Again, this turned out not to be a problem - and we weren't really being lax when you consider how many open source projects have switched, or are switching, to Subversion. We just _knew_ Subversion worked.

We quite deliberately didn't plan any reorganisation or pruning of the CVS repository before importing it: Subversion would allow us to make such changes in a better controlled way, once we got to the other side. For similar reasons, we didn't change keyword expansion properties on import. Again, Subversion allows you to manage such properties better than CVS does, and now was not the time to start arguing whether or not we really thought keyword expansion was a good idea (keyword expansion is discussed and assessed below - Alan).

We didn't fix any of our build scripts in advance. As part of the evaluation we'd grepped the source for all such scripts and it turned out you could count the number of scripted calls to cvs on the fingers of one hand. We were confident we could fix these pretty much as soon as our Subversion server went live.

We didn't even bother evaluating any advanced Subversion clients. I used the command line almost exclusively for experimentation: others were happy to defer setting up TortoiseSVN, Subclipse, psvn, etc., until they actually had to.

The one crucial item we neglected from our plan was to perform acceptance tests on the freshly imported repository. Fortunately we discovered the problem with our carelessness almost immediately and were able to recover swiftly.

The problem

The problem we had was with binary files which had (wrongly) been checked into CVS as text files. On import, by default, cvs2svn does a couple of things to text files which can seriously damage binary files:

keyword-expansion is enabled meaning that byte sequences which match patterns such as $Id: $ get changed when you check the file out.

(Strictly speaking, cvs2svn sets svn:keywords on CVS files to author id date if the mode of the RCS file in question is either kv , kvl or not kb .)

the end-of-line style property is set to native , meaning again that the binary file you check out may not be the one you checked in, since Subversion makes sure end-of-line sequences are the ones preferred by your client platform.

We'd messed up but fortunately we'd messed up in an immediately obvious way: a number of binaries were broken, to the point that they wouldn't even execute.

This is one of those mistakes you only make once (until you make it the next time and kick yourself even harder, that is). I guess we were lulled into a false sense of security: everything seemed to be working so smoothly ... Subversion is better than CVS at handling binary files ... everything had been working fine with CVS, so our CVS repository must be fine ... cvs2svn would spot any problems.

Of course, our CVS repository wasn't fine. We'd got away with binary files marked as text for the simple reason that most of these files had been used on Linux only.

Acceptance tests

What makes this mistake so chastening is the fact that a basic acceptance test of the new repository would have been both simple and scriptable:


#!/bin/sh
  cvs co CVSARCHIVE fromcvs
  # Checkout from CVS, on the trunk
  svn co SVNREPOS/trunk fromsvn
  # Checkout from SVN, on the trunk
  diff -q -r fromcvs fromsvn > all_diffs
  # Spot the difference

If the all_diffs file is empty, the CVS and Subversion checkouts are byte-for-byte compatible.

Unfortunately the all_diffs file wasn't empty. Remember those keyword expansions? Subversion is clever enough to replace CVS version numbers with its own revision numbers and as a result the files differ when checked out. Keyword expansion really is a bad idea!

Similarly, a number of text files were different because Subversion had tidied up inconsistent line endings.

So, there were plenty of false hits as well as a list of files we needed to run cvsadmin -kb on.

Incidentally, we could have chosen to clean up the files during import by passing some more parameters to cvs2svn : a suitable combination of
--mime-types=FILE , --eol-from-mime-type and --no-default-eol options would have done the job. We decided, though, that the proper solution was to fix the root cause of the problem.

Recovery

So, we had to delay by a day to reinstate CVS, run the text-to-binary corrections, re-run the migration, perform acceptance tests. This time we were more cautious and we also tested builds made from the clean Subversion checkout.

Scheduled backups

I won't go into detail here about the differences between CVS and Subversion. There's plenty of solid documentation already available.

One thing worth mentioning is the strategy we adopted for Subversion backups. Previously, our CVS repository had been mirrored to a backup machine using a rsync job scheduled to run every couple of hours. Tape backups of this mirror were kept offsite.

I had some reservations about this strategy, particularly since (thanks to our hyperactive and insomniac automatic build user) the CVS archive was rarely quiet. Simply treating the CVS archive as a bunch of files - which is what rsync does - seemed risky. Would the mirror be in good shape if rsync ran in parallel with a check-in?

Subversion provides the ability to make a hot backup of a live repository using the svnadmin hotcopy command. The repository dump can be loaded into a Subversion repository using svnadmin load . So, something as simple as:


  svnadmin hotcopy /path/to/live/repository
    /path/to/mirror/repository

creates a full mirror of the live repository - if you're prepared to wait a while, that is.

Once this mirror has been created, it can be maintained by merging in incremental changes using svnadnim dump --incremental to dump the changes and svnadmin load to load them into the repository mirror.

How much should you import?

We never really explored the idea of restricting what we imported into Subversion. Cvs2svn offers lots of choices:

you can topskim your repository, meaning you get no history, no branches - a fresh start.
you can import absolutely everything, meaning you get every single branch ever made, every bungled thirdparty import - everything!
you can import anything in between these two extremes by selecting which tags and branches to import.

It's hard to argue against the 'import everything' option: source control is all about tracking and managing changes, so why should any change ever be thrown away? And, as already mentioned, Subversion does provide a 'delete' option (of course, anything you delete can be recovered), so you can tidy up at any point.

Since the initial import we've exercised svn delete rather a lot, and every time we need to upgrade a vendor branch we end up moving things. I still wonder if something closer to a topskim wouldn't have been better. We'll never know. And I secretly wish we'd accidentally-on-purpose turned off keyword expansion!

Changing tools revisited

As already mentioned, tool changes can be hard to justify. However, despite the hiccup in the migration, CVS to Subversion required little effort and led to almost no downtime. Perhaps the stated reasons for change didn't seem that compelling - if we'd lived without atomic commits and version controlled file systems for so long, surely we didn't really need them? The paradox here is that you can't really appreciate how important these features are until you actually use them - and so, from the other side of the change, we wonder how we ever did without them!

What I like most about Subversion though is that, from both a user's and an administrator's perspective, it's simpler than CVS. All too often a software upgrade means buying in to more features and more complexity. Think of all those new people joining the team, some of whom may never have used source control. Consider explaining the Subversion model for repository revisions, branches, tags. Now consider explaining the same topics using the CVS model. Clearly less time will be needed getting people up to speed.

Conclusions

Everyone likes the new source control system, which is important - freedom of choice may be acceptable for editors, web browsers, and even operating systems, but a team really must agree to share a source control system. It's important to like the tools you use every day.

CVS was good but Subversion is better. As already mentioned, head-to-head, on the same hardware, CVS managed to beat Subversion on clean checkouts - but who said we had to use the same hardware? We invested in a powerful new computer to serve our powerful new source control system, so even clean checkouts are quicker. Routine operations on a working copy are much quicker.

Upgrading build scripts did indeed turn out to be simple.

Four clients are in active use (five if you count the command line client, svn , itself). I use the psvn Emacs integration, which is very similar to pcl-cvs . Subclipse, TortoiseSVN and kdesvn are also popular with Eclipse, Windows and KDE users respectively.

So, CVS to Subversion makes good sense, but do beware of pitfalls in the import procedure.

Other sources

CVS: http://www.nongnu.org/cvs/
Subversion: http://subversion.tigris.org/cvs2svn
A Python script that converts a CVS repository to a Subversion repository: http://cvs2svn.tigris.org/
Subclipse, A Subversion Eclipse plugin: http://subclipse.tigris.org/
TortoiseSVN, A Subversion client implemented as a windows shell
extension: http://tortoisesvn.tigris.org/
psvn, Subversion interface for emacs: http://svn.collab.net/repos/svn/trunk/contrib/client-side/psvn/psvn.el

Credits

My thanks to the editorial team at Overload for their help with this article.

References

[ Subversion] http://svnbook.red-bean.com/en/1.2/svn.serverconfig.html

[ Subversion2] http://svnbook.red-bean.com/en/1.2/svn.serverconfig.html#svn.serverconfig.overview.tbl-1

[ Subversion3] http://svnbook.red-bean.com/en/1.2/svn.reposadmin.html#svn.reposadmin.basics.backends

[ Subversion4] http://svnbook.red-bean.com/en/1.2/svn.reposadmin.html#svn.reposadmin.basics.backends.tbl-1