This term I’ve been working on analyzing MarkUs’ performance under load (see ). The goal was to investigate if MarkUs’ performance decreases significantly under certain circumstances. If it does, a sub-goal of my project was to investigate what caused a potential performance problem. In particular, I was simulating a scenario where students would work alone on an assignment. Moreover, I was looking at the extreme case where no Subversion repositories existed for any student.
At this point I can say, yes, there can be a performance problem. When too many students try to submit files via MarkUs at the same time performance deteriorates. A user would notice this performance problem by very long response times. In extreme cases response times of 20 seconds for a single request (not hit) have been observed. The root cause as to why response times increase drastically seems to be IO related. Where are these IO requests coming from? I’m still not 100% certain, but it looks like this is related to creation of Subversion repositories. MarkUs stores student’s submissions in Subversion repositories. If no repository exists when a student logs in for the first time and students work alone on an assignment MarkUs creates a Subversion repository for that student when the student interface URL is first visited for this assignment. Also note that every submission results in calls to Subversion via the Subversion Ruby bindings from within MarkUs.
In general, the higher the ratio of simultaneous requests to the number of mongrels (or Passenger workers), the slower overall response times. Variations of response times when the number of students in a course are changed are fairly minor.
As mentioned earlier, these are results of simulating a scenario where students work alone on an assignment and have never logged in previously. Student requests have been simulated by running the post_submissions.sh script on client machines. Scripts which I’ve been using are available in this review and should be available in the official MarkUs repository at some point later. Only PostgreSQL has been used as the DB backend. I don’t believe changing this to MySQL will yield much different results. Apache httpd has been used as the front-end webserver reverse-proxying to individual Mongrel servers. For some experiments Phusion Passenger has been used instead of Mongrel. The difference in performance between the two deployment platforms was fairly insignificant for the experiments performed (considering that Passenger uses 6 Ruby workers by default and comparing it to a similar setup with 6 Mongrel servers). For this analysis MarkUs version 0.10.0 has been used (on Rails 2). I don’t anticipate huge differences between a Rails 2 and Rails 3 based MarkUs. Details about the lab setup I’ve been using were described in the first blog post of this series.
In order to get a more detailed view as to what was going on on the MarkUs server machine while each individual experiment was run the following tools have been used: top, iotop, iostat, oprofile, request-log-analyzer and some hand crafted scripts. OProfile data was inconclusive (or I was perhaps using it wrong). Example profiling output is available here. Top reported load averages of 3-18 and up to 50% (avg ~30%) IO waiting with 20-60% user CPU utilization. Less IO wait percentages have been observed towards the end of each experiment and when 12 mongrels have been used. iotop reported Linux’s Ext4 journaling daemon as the top IO consumer closely followed by Ruby’s logger for the production.log file of each Mongrel.
Here is the list of performed experiments:
|Exp. #||# Stud.||# Mon.||# Cli.||# R.p.Cli||Sim. cc. Subm.|
|P1||800||equiv of 6||8||4||32|
|P2||832||equiv of 6||8||8||64|
|P3||800||equiv of 6||8||4||32|
|P4||800||equiv of 6||8||4||32|
Exp. # is the experiment identifier, # Stud. is the number of students, #Mon. is the number of Mongrel servers, # Cli. is the number of client machines used (where the post_submission.sh script was executed #R.p.Cli times), # R.p.Cli. is the number of post_submission.sh calls per client machine (i.e. one client machine simulated up to 18 students) and # Sim. cc. Subm. is the number of simulated concurrent submissions (= #R.p.Cli x # Cli.).
All “P” experiments have been performed using Phusion Passenger, all “M” experiments were Mongrel based. M13-M15 varies only the number of students in a class. M12-M20 were basically a repeat of experiments M1-M9. M10-M20 had configuration in place so as to make Mongrels log to their individual copy of production.log. M1-M9 shared one production.log and logs were useless due to interleaved log output. P3 and P4 are interesting as for P3 almost no SVN interaction was achieved by running the same experiment twice without deleting repositories and dropping the database of the previous run. Due to this submissions were not accepted. One would have to explicitly replace files as opposed to resubmitting them in order to get them accepted by MarkUs. This is expected behaviour. Hence, this repeated submission resulted in so changed submissions being recorded (i.e. almost no SVN interaction). P4 is an experiment with student’s repositories created prior the actual run of the experiment. However repositories were empty so submissions as issued by request #7 were recorded.
The requests/URL mapping is shown in the following table:
These are the requests (in order) each call to post_submissions.sh performs. I.e. get the log-in page, log in (POST), follow the resulting 2 redirects to the students dashboard, go to the student interface of the first assignment, open the file manager (Submissions link), submit files (POST).
Raw logs and tables are available in a Git repository which I’ve created for this purpose. Logs have been analyzed by using the elapsed time (as reported by /usr/bin/time) per request on client machines. Sanity checks have been performed by also analyzing server-side logs (production.log) via request-log-analyzer. Server-side generated and client-side generated response time numbers matched with a small margin of error.
The above graph illustrates that with increasing number of students in a course the response time grows fairly slow (400 to 1400 students results in an increase of response time from 2.2 to 2.8 seconds per request)
The above two graphs try to show if there is a correlation between the ratio of simultaneous requests over the number of mongrels and the average response time. In general the more overloaded a single Mongrel the slower the overall response times. More mongrels may bring the response times down a little bit but not to an amount as one would have hoped for (see M19 and M9). There may be some performance gain if mongrels run on different machines than the Apache reverse proxy and the PostgreSQL server. In most experiments these ran on one machine. M20 had 6 mongrels running on the main server and 6 mongrels on a different machine. Perhaps more gain could be achieved if the database server is on one machine, the reverse proxy on another and the mongrels distributed among a set of other machines sharing the file system containing the Subversion repositories.
This graph shows the average response times per request. Please refer to the table above in order to see which request number maps to which URL. The numbers in the legend are the total numbers of simulated concurrent requests. We now take a closer look at experiment labelled with 32 in the above graph (experiment P4). Note that 32 should be compared to 14 as it’s not the amount of simultaneous requests alone which are of significance. The number of mongrels running on the server are a factor as well. Thus, the ratio between concurrent requests and the number of mongrels seems to be a good heuristic for comparing experiments. Note that said ratio is closer between 14 and 32 as compared to 32 and 35 (see graph below). Since experiment P4 had Subversion repositories already created prior the experiment it is not surprising to see the absence of the bump of request 5.
This is the exact same graph as the one preceding this one with the only difference in the choice of the labelling. Instead of the number of concurrent requests it shows the ratio of the number of concurrent requests over the number of mongrels.
Conclusion and Future Work
So what are the lessons learned?
- Under heavy load and in a poor setup response times of 20 seconds and more can happen
- Adding mongrels is fairly cheap and may bring some performance gain.
- Distributing mongrels among a set of application servers may improve performance even more.
- Subversion interactions are expensive. I recommend to get students to log-in and have a look at the assignment (if it’s a single student assignment and the SVN repository has not yet been created) at some off-peak time in order to reduce IO when the deadline of an assignment is approaching.
- Logging to production.log may be a source of IO on the system.
- 12 mongrels seem to perform better than 6. The performance gain from 3 to 6 mongrels is less significant. I’m not sure why…
- I recommend users to estimate the expected number of concurrent submission based on the number of students in the course and historical data. Based on the expected number of concurrent submissions at peak time try to keep the ratio of concurrent submissions over the number of mongrels at < 5
- It is a good idea to add configuration to the environment so that mongrel instances log to separate log files. This way production log files can be used for further analysis with regards to performance bottlenecks.
Where to go from here? It would be interesting to see if a different version control system as a back-end would change any of the above results. Moreover the assumption is that IO is coming from Subversion, but what if its just simple logging or logging plus Subversion plus IO from PostgreSQL? Perhaps there is some better way to inspect low level IO. This may help with reasoning as to where said IO is coming from. iotop and top should be a good start but may be too coarse grained. I’d also be interested to see if a more distributed production setup would be capable of processing more concurrent requests in less time.
It’s been fun working on this project. Please do let me know (in the comments) if there is something I’ve missed or if you have other thoughts on this topic. Thanks!