I’m currently researching areas where Markus performance may see improvement. This involves reading through major parts of the wiki, particularly those relating to the schema and design, as well as previous issues and parts of the code base on Github. The blog is also a great source of information, as prior contributors have mentioned performance issues. Severin even did a complete benchmark of Markus: http://blog.markusproject.org/?p=3383
His findings, throughout the 4-part analysis, include the cost of subversion interactions. In part 3, he mentions that it’s likely that the primary contributor to heavy IO during peak traffic involves svn: subversion repository creation and file storage on existing repos. But can this be improved upon?
One of the future goals for the Markus Project includes a transition from Subversion to Git. I love Git, and Grit is a well-documented ruby gem for interacting with Git repositories. Much better documented than the svn ruby bindings. But given a switch from svn to git, could we expect improved performance?
Much of the git community claims that git is much faster than svn: https://git.wiki.kernel.org/index.php/GitSvnComparison The git website even presents some benchmarks for comparison: http://git-scm.com/about/small-and-fast The numbers are promising, but unfortunately, the commit operations include pushing, which I don’t believe is part of Markus’ operations. They also don’t involve grit nor the svn ruby bindings. So to research whether or not Markus’ peformance could improve from a transition to git/grit, I figured I’d have to write a small benchmark.
The code can be found in a gist: https://gist.github.com/4658271
The code discussed completes 4 operations, and outputs their running time:
- Initializing 10,000 svn repos
- Initializing 10,000 git repos
- Initializing and commiting a small test file to 10,000 svn repos
- Initializing and commiting a small test file to 10,000 git repos
The test file that’s committed is called ‘test.txt’, and contains the string “hello world”.
The 40,000 repositories is a small enough sample size such that my laptop can complete the script in a bit over 10 minutes. I may come back to this with a larger sample, leaving it to execute over night, at a later time.
And note that the script keeps a reference to all those repository objects, so it uses a fairly large amount of memory. We aren’t setting things to nil to give the garbage collector time to go through while the process is running.
Before getting to the results, here’s some details on the benchmark environment:
OSX 10.6 Snow Leopard
2.4Ghz Core 2 Duo
svn version 1.7.8 (r1419691)
git version 126.96.36.199
And now for the numbers:
- Creating 10,000 svn repos: 119184.502 ms
- Creating 10,000 git repos: 120582.207 ms
- Creating and commiting to 10,000 svn repos: 285041.101 ms
- Creating and commiting to 10,000 git repos: 223160.437 ms
From this simple experiment, we can see that initialization was only minutely slower with git than svn, which comes as a surprise given the smaller size of git repositories. But we start to see a stronger contrast once committing files was involved, and svn showed to be ~22% slower in comparison. If initialization wasn’t included in those operations, the difference should be even greater.
I should also mention the size of the generated directories. 20,000 repositories were generated with both git and svn, making for 40,000 in total. The combined size of the 20,000 SVN repositories, of which half had our test file committed, is 594 Mb. And git? 278 Mb. That’s less than half the size!
And so given the benchmarks previously available, as well as the results from this simple benchmark, I agree with the majority: a switch from SVN to GIT should help improve performance. It may not be a high priority at this time, but with svn interactions being such a bottleneck under heavy load, it should help alleviate some of the strain. As such, though I’ll continue researching areas where Markus may see performance benefits over the coming week, I’ll keep this transition in mind as a possible goal for this work term. It would certainly be a large task to complete.