I will be working on analyzing MarkUs’ performance this term. Here is what I have so far:
In order to look at MarkUs’ performance under various conditions I have the following lab setup comprising of a server machine and eight clients. MarkUs will be hosted on the server machine and its performance will be evaluated there. The eight clients will be used to “simulate” load on the server. Here is a picture illustrating the setup:
As illustrated, clients are connected via a network switch to the server. All machines are physical machines (i.e. not virtual machines). The server machine has 2 Gigabytes of RAM and is a Intel Core 2 machine. I.e. it has 2 cores at 1.86GHz clock frequency. It runs on Ubuntu LTS 10.04 and the PostgreSQL database runs on the same machine as well. All filesystems are local and are ext4 formatted. The specs of the client machines should be irrelevant for this analysis.
In order to get some data out of this setup I intend to proceed this analysis incrementally. One focus of this study is to look at MarkUs’ performance when there are 500+ students in a course. In fact, I hope this analysis will help us to come up with some recommendations in terms of the maximum number of students which should not be exceeded per MarkUs instance.
As a first step I will be trying to reproduce a potential performance problem of MarkUs when there are 500+ students in a course and some subset of them is submitting code simultaneously. In order to do so I will use a mongrel cluster on the server behind an Apache httpd reverse proxy (which is used at the University of Toronto and University of Waterloo). In order to simulate 500+ students submissions, I will be using benchmarking scripts in lib/tools of the MarkUs code base. Since MarkUs on Rails 3 is not quite ready yet, I will be using MarkUs 0.10.1 on Rails 2.
Once I was able to reproduce this performance problem, I will start to take a closer look at what might cause this. Questions I’d like to answer are:
- Is there a reproducible performance problem?
- When does performance start to degrade? At 500 students? 5000 students? Something else?
- How does a clustered mongrel setup compare to a setup using Phusion Passenger with respect to performance? Can we get some hard numbers on this?
- What is the cause of the performance breakdown (if any)? Is it one cause? A combination of many things?
- Is there a difference between students working in groups vs. students working alone setups?
- How can we alleviate potential performance problems?
Later throughout my analysis I’d also like to see if there is any difference in results when a DB snapshot of a former production system is being used.
Any other suggestions as to how to go about this? Am I missing something? What would you like to get out of this?
In order to figure out what the cause of potential performance problems of MarkUs is, I plan to use the following tools:
- Analyze response time in production logs with say Rails Analyzer in order to get some concrete numbers as to how response time behaves as the number of students and repositories increases.
- Use a Ruby profiler, for example Ruby Prof, in order to profile problematic code.
- Use a system profiler such as Oprofile in order to get a bigger picture as to where time is mostly spent system wide. In the database? File IO? MarkUs code (i.e. Ruby code)? Something else?
What else should I be using? Thoughts? I anticipate that it’s going to be challenging to go from the big picture – there is a performance problem – down to – what is causing it – since there are a lot of components involved (Subversion, PostgreSQL, Ruby, Rails, MarkUs).
What do you think? Let’s hear it 🙂