Archive for the ‘Performance Analysis’ Category
I’m currently researching areas where Markus performance may see improvement. This involves reading through major parts of the wiki, particularly those relating to the schema and design, as well as previous issues and parts of the code base on Github. The blog is also a great source of information, as prior contributors have mentioned performance issues. Severin even did a complete benchmark of Markus: http://blog.markusproject.org/?p=3383
His findings, throughout the 4-part analysis, include the cost of subversion interactions. In part 3, he mentions that it’s likely that the primary contributor to heavy IO during peak traffic involves svn: subversion repository creation and file storage on existing repos. But can this be improved upon?
One of the future goals for the Markus Project includes a transition from Subversion to Git. I love Git, and Grit is a well-documented ruby gem for interacting with Git repositories. Much better documented than the svn ruby bindings. But given a switch from svn to git, could we expect improved performance?
Much of the git community claims that git is much faster than svn: https://git.wiki.kernel.org/index.php/GitSvnComparison The git website even presents some benchmarks for comparison: http://git-scm.com/about/small-and-fast The numbers are promising, but unfortunately, the commit operations include pushing, which I don’t believe is part of Markus’ operations. They also don’t involve grit nor the svn ruby bindings. So to research whether or not Markus’ peformance could improve from a transition to git/grit, I figured I’d have to write a small benchmark.
The code can be found in a gist: https://gist.github.com/4658271
The code discussed completes 4 operations, and outputs their running time:
- Initializing 10,000 svn repos
- Initializing 10,000 git repos
- Initializing and commiting a small test file to 10,000 svn repos
- Initializing and commiting a small test file to 10,000 git repos
The test file that’s committed is called ‘test.txt’, and contains the string “hello world”.
The 40,000 repositories is a small enough sample size such that my laptop can complete the script in a bit over 10 minutes. I may come back to this with a larger sample, leaving it to execute over night, at a later time.
And note that the script keeps a reference to all those repository objects, so it uses a fairly large amount of memory. We aren’t setting things to nil to give the garbage collector time to go through while the process is running.
Before getting to the results, here’s some details on the benchmark environment:
OSX 10.6 Snow Leopard
2.4Ghz Core 2 Duo
svn version 1.7.8 (r1419691)
git version 18.104.22.168
And now for the numbers:
- Creating 10,000 svn repos: 119184.502 ms
- Creating 10,000 git repos: 120582.207 ms
- Creating and commiting to 10,000 svn repos: 285041.101 ms
- Creating and commiting to 10,000 git repos: 223160.437 ms
From this simple experiment, we can see that initialization was only minutely slower with git than svn, which comes as a surprise given the smaller size of git repositories. But we start to see a stronger contrast once committing files was involved, and svn showed to be ~22% slower in comparison. If initialization wasn’t included in those operations, the difference should be even greater.
I should also mention the size of the generated directories. 20,000 repositories were generated with both git and svn, making for 40,000 in total. The combined size of the 20,000 SVN repositories, of which half had our test file committed, is 594 Mb. And git? 278 Mb. That’s less than half the size!
And so given the benchmarks previously available, as well as the results from this simple benchmark, I agree with the majority: a switch from SVN to GIT should help improve performance. It may not be a high priority at this time, but with svn interactions being such a bottleneck under heavy load, it should help alleviate some of the strain. As such, though I’ll continue researching areas where Markus may see performance benefits over the coming week, I’ll keep this transition in mind as a possible goal for this work term. It would certainly be a large task to complete.
Here is a screencast of a running MarkUs load test (ogv). Enjoy!
This term I’ve been working on analyzing MarkUs’ performance under load (see ). The goal was to investigate if MarkUs’ performance decreases significantly under certain circumstances. If it does, a sub-goal of my project was to investigate what caused a potential performance problem. In particular, I was simulating a scenario where students would work alone on an assignment. Moreover, I was looking at the extreme case where no Subversion repositories existed for any student.
At this point I can say, yes, there can be a performance problem. When too many students try to submit files via MarkUs at the same time performance deteriorates. A user would notice this performance problem by very long response times. In extreme cases response times of 20 seconds for a single request (not hit) have been observed. The root cause as to why response times increase drastically seems to be IO related. Where are these IO requests coming from? I’m still not 100% certain, but it looks like this is related to creation of Subversion repositories. MarkUs stores student’s submissions in Subversion repositories. If no repository exists when a student logs in for the first time and students work alone on an assignment MarkUs creates a Subversion repository for that student when the student interface URL is first visited for this assignment. Also note that every submission results in calls to Subversion via the Subversion Ruby bindings from within MarkUs.
In general, the higher the ratio of simultaneous requests to the number of mongrels (or Passenger workers), the slower overall response times. Variations of response times when the number of students in a course are changed are fairly minor.
As mentioned earlier, these are results of simulating a scenario where students work alone on an assignment and have never logged in previously. Student requests have been simulated by running the post_submissions.sh script on client machines. Scripts which I’ve been using are available in this review and should be available in the official MarkUs repository at some point later. Only PostgreSQL has been used as the DB backend. I don’t believe changing this to MySQL will yield much different results. Apache httpd has been used as the front-end webserver reverse-proxying to individual Mongrel servers. For some experiments Phusion Passenger has been used instead of Mongrel. The difference in performance between the two deployment platforms was fairly insignificant for the experiments performed (considering that Passenger uses 6 Ruby workers by default and comparing it to a similar setup with 6 Mongrel servers). For this analysis MarkUs version 0.10.0 has been used (on Rails 2). I don’t anticipate huge differences between a Rails 2 and Rails 3 based MarkUs. Details about the lab setup I’ve been using were described in the first blog post of this series.
In order to get a more detailed view as to what was going on on the MarkUs server machine while each individual experiment was run the following tools have been used: top, iotop, iostat, oprofile, request-log-analyzer and some hand crafted scripts. OProfile data was inconclusive (or I was perhaps using it wrong). Example profiling output is available here. Top reported load averages of 3-18 and up to 50% (avg ~30%) IO waiting with 20-60% user CPU utilization. Less IO wait percentages have been observed towards the end of each experiment and when 12 mongrels have been used. iotop reported Linux’s Ext4 journaling daemon as the top IO consumer closely followed by Ruby’s logger for the production.log file of each Mongrel.
Here is the list of performed experiments:
|Exp. #||# Stud.||# Mon.||# Cli.||# R.p.Cli||Sim. cc. Subm.|
|P1||800||equiv of 6||8||4||32|
|P2||832||equiv of 6||8||8||64|
|P3||800||equiv of 6||8||4||32|
|P4||800||equiv of 6||8||4||32|
Exp. # is the experiment identifier, # Stud. is the number of students, #Mon. is the number of Mongrel servers, # Cli. is the number of client machines used (where the post_submission.sh script was executed #R.p.Cli times), # R.p.Cli. is the number of post_submission.sh calls per client machine (i.e. one client machine simulated up to 18 students) and # Sim. cc. Subm. is the number of simulated concurrent submissions (= #R.p.Cli x # Cli.).
All “P” experiments have been performed using Phusion Passenger, all “M” experiments were Mongrel based. M13-M15 varies only the number of students in a class. M12-M20 were basically a repeat of experiments M1-M9. M10-M20 had configuration in place so as to make Mongrels log to their individual copy of production.log. M1-M9 shared one production.log and logs were useless due to interleaved log output. P3 and P4 are interesting as for P3 almost no SVN interaction was achieved by running the same experiment twice without deleting repositories and dropping the database of the previous run. Due to this submissions were not accepted. One would have to explicitly replace files as opposed to resubmitting them in order to get them accepted by MarkUs. This is expected behaviour. Hence, this repeated submission resulted in so changed submissions being recorded (i.e. almost no SVN interaction). P4 is an experiment with student’s repositories created prior the actual run of the experiment. However repositories were empty so submissions as issued by request #7 were recorded.
The requests/URL mapping is shown in the following table:
These are the requests (in order) each call to post_submissions.sh performs. I.e. get the log-in page, log in (POST), follow the resulting 2 redirects to the students dashboard, go to the student interface of the first assignment, open the file manager (Submissions link), submit files (POST).
Raw logs and tables are available in a Git repository which I’ve created for this purpose. Logs have been analyzed by using the elapsed time (as reported by /usr/bin/time) per request on client machines. Sanity checks have been performed by also analyzing server-side logs (production.log) via request-log-analyzer. Server-side generated and client-side generated response time numbers matched with a small margin of error.
The above graph illustrates that with increasing number of students in a course the response time grows fairly slow (400 to 1400 students results in an increase of response time from 2.2 to 2.8 seconds per request)
The above two graphs try to show if there is a correlation between the ratio of simultaneous requests over the number of mongrels and the average response time. In general the more overloaded a single Mongrel the slower the overall response times. More mongrels may bring the response times down a little bit but not to an amount as one would have hoped for (see M19 and M9). There may be some performance gain if mongrels run on different machines than the Apache reverse proxy and the PostgreSQL server. In most experiments these ran on one machine. M20 had 6 mongrels running on the main server and 6 mongrels on a different machine. Perhaps more gain could be achieved if the database server is on one machine, the reverse proxy on another and the mongrels distributed among a set of other machines sharing the file system containing the Subversion repositories.
This graph shows the average response times per request. Please refer to the table above in order to see which request number maps to which URL. The numbers in the legend are the total numbers of simulated concurrent requests. We now take a closer look at experiment labelled with 32 in the above graph (experiment P4). Note that 32 should be compared to 14 as it’s not the amount of simultaneous requests alone which are of significance. The number of mongrels running on the server are a factor as well. Thus, the ratio between concurrent requests and the number of mongrels seems to be a good heuristic for comparing experiments. Note that said ratio is closer between 14 and 32 as compared to 32 and 35 (see graph below). Since experiment P4 had Subversion repositories already created prior the experiment it is not surprising to see the absence of the bump of request 5.
This is the exact same graph as the one preceding this one with the only difference in the choice of the labelling. Instead of the number of concurrent requests it shows the ratio of the number of concurrent requests over the number of mongrels.
Conclusion and Future Work
So what are the lessons learned?
- Under heavy load and in a poor setup response times of 20 seconds and more can happen
- Adding mongrels is fairly cheap and may bring some performance gain.
- Distributing mongrels among a set of application servers may improve performance even more.
- Subversion interactions are expensive. I recommend to get students to log-in and have a look at the assignment (if it’s a single student assignment and the SVN repository has not yet been created) at some off-peak time in order to reduce IO when the deadline of an assignment is approaching.
- Logging to production.log may be a source of IO on the system.
- 12 mongrels seem to perform better than 6. The performance gain from 3 to 6 mongrels is less significant. I’m not sure why…
- I recommend users to estimate the expected number of concurrent submission based on the number of students in the course and historical data. Based on the expected number of concurrent submissions at peak time try to keep the ratio of concurrent submissions over the number of mongrels at < 5
- It is a good idea to add configuration to the environment so that mongrel instances log to separate log files. This way production log files can be used for further analysis with regards to performance bottlenecks.
Where to go from here? It would be interesting to see if a different version control system as a back-end would change any of the above results. Moreover the assumption is that IO is coming from Subversion, but what if its just simple logging or logging plus Subversion plus IO from PostgreSQL? Perhaps there is some better way to inspect low level IO. This may help with reasoning as to where said IO is coming from. iotop and top should be a good start but may be too coarse grained. I’d also be interested to see if a more distributed production setup would be capable of processing more concurrent requests in less time.
It’s been fun working on this project. Please do let me know (in the comments) if there is something I’ve missed or if you have other thoughts on this topic. Thanks!
This is a follow-up post of my previous two, MarkUs Performance Analysis (1) and Markus Performance Analysis (2). Please have a look at them as they detail the set-up of the lab machines I’ve been using as well as which scripts have been used and how to use them. Also note that I’ve added a couple of additional scripts in order to make it easier to kick off load test runs. More on the load testing scripts are in this review request. At this point I am able to report first results. If you are interested in the gory details, please have a look at this Git repo which contains all raw logs in addition to the spreadsheets and other things I’ve produced while conducting experiments.
Here is a table with first results of various experiments I’ve conducted so far. A “Runner” is one call to ./post_submissions.sh. Each call to ./post_submissions.sh in turn generates 7 requests to the MarkUs server. Unless otherwise noted, results are for a clean MarkUs installation (no Subversion repositories exist). Note that the timing info is the elapsed time as reported by /usr/bin/time for each curl call. Results listed are averages over number of students (one student is one sample).
Here are a few observations I’ve made during my experiments.
- Both, mongrel based and Passenger based MarkUs set-ups seem to be IO bound under load on a clean MarkUs instance (load averages > 2; up to 17’ish on a dual core server machine). Clean MarkUs instance means no Subversion repositories exist prior to each experiment.
- Memory does not seem to be an issue. 2 GB of RAM was no problem for 24 mongrels.
- Top reports around 40% IO waiting when experiments are run. The question is where are these IO requests coming from?
- When an experiment runs, Ruby processes show up at the top of the list of “top” (for both, mongrel and Passenger setups). If I add the WCHAN field in top, the sleep function of these processes seem to be filesytem related or scheduler related. Note the filesystem of the server is ext4. In particular, the most prominent “sleep functions” are jbd2_log, synch_page, get_write and blk_dev_is.
- I’ve used oprofile in order to profile the entire system when an experiment is run. Here is an example of opreport outputof one run. Not surprisingly about 60% of the time the CPU was not halted a Ruby process was running (libruby.so.1.8.7). Note that Subversion and PostgreSQL related percentages are negligible. I’m not sure why. Does anybody have thoughts on this?
- A second run of ./run-load-tests.sh with Subversion repositories already existing and submissions resulting in conflicts (i.e. no SVN IO) are significantly faster (~3 times faster). See last experiment in table.
- Passenger based setups seem to use 6 ruby processes. Seems to be similar to what could be achieved via an Apache reverse proxy with a cluster of 6 mongrels. Perhaps this is different for non-IO heavy workloads.
So what does all of this mean? Good question. Here are my thoughs:
- I think the main causes of heavy IO and, hence, sources of slowdown are request number 5 (where Subversion repositories get created) and request number 7 (where file submissions are happening and files are stored in Subversion repositories).
- There does not seem to be a significant difference between Mongrel and Passenger based set-ups (at least for the IO heavy experiment).
- Something else?
Now I’d be interested in your input. How would you interpret the data? What additional experiments should I run? Did I make a mistake somewhere? Please leave your feedback in the comments. Thanks!
As mentioned earlier, I am currently working on analyzing MarkUs’ performance. After a few initial bumps I have a basic curl based script ready which can be used to simulate students submitting code through MarkUs. Here is the review where I introduce this new script. I should mention that this script is not very resilient to errors and is fairly sensitive with respect to the URLs you configure in that script. Have a look at this post as to why that’s the case. So here are the requirements for this to work:
- Use MarkUs 0.10.x (MarkUs 0.11.x should work as well, but a Rails 3 based MarkUs will likely no quite work; at the very least it has been untested)
- Prepare the server as described below.
- Make sure that you have curl installed on the client machine.
The curl based script (
lib/benchmarking/post_submissions.sh) comes with a buddy rake task called
markus:benchmark:students_create. This rake task can be used to create many student users. It’ll destroy any existing student users and will hand out user names of the form student_1, student_2, student_3 and so on. It is important that you use this rake task on the server in order to create many student users, since post_submissions.sh attempts to log in as student_<#> (<#> will depend on the range you passed as a parameter to this script when you ran it). Ok, without further ado, here is the cookie-cutter-recipe:
First make sure that CSRF is turned on for your MarkUs instance. Make sure you have this line in your
config.action_controller.allow_forgery_protection = false
Then to do the following:
$ bundle exec rake repos:drop RAILS_ENV=production $ bundle exec rake db:reset RAILS_ENV=production $ bundle exec rake db:populate RAILS_ENV=production $ bundle exec rake markus:benchmark:students_create num=1000 \ RAILS_ENV=production
The last command will create 1000 student users,
After this it’s good to do some sanity checking. Make sure that logging in as
student_1000 works. Now you are ready to run
post_submissions.sh. Here is one way to do it. It will use the MarkUs instance as specified by
MARKUS_BASE_URL and submits all files in directory
lib/benchmarking/submission_files for students
$ cd path/to/copy/of/lib/benchmarking $ ./post_submissions.sh 1 10
Note that the user who runs
post_submissions.sh needs to have write permissions in the current working directory (the script needs to store/update the MarkUs cookie). After running the script you should see a directory created which holds one log file per student submission. At the moment this is the only way to figure out what happened when the script attempted to log in, visit the assignments page and submitted the files for assignment 1. Keep in mind that if you get any conflict while submitting, nothing will be submitted. This is how MarkUs works. Thus, make sure you are not submitting a file (same file name) twice.
If all went well, you should see the submitted files in a browser session as well.
So that’s the status for the time being. I’ll be likely writing another script which allows one to distribute and execute
post_submissions.sh on many clients. Keep your eyes peeled for a follow up post 🙂 Questions, concerns? Let me know in the comments. Thanks!
I am trying to get some data about how MarkUs behaves under load. One aspect would also be to get a better feel for how scalable MarkUs is. I’m aiming to simulate a production environment which is something of the order of a couple of hundred students submitting code (and perhaps implicitly creating a Subversion repositories) at the same time. So far I’ve been trying to create a script which logs on to MarkUs, visits relevant pages and submits code. However, I’ve been hitting a bit of a snag. And here is why.
Problem 1: Cookies
Problem 2: HTTP Redirects
Another thing which makes script based load testing painful is that HTTP redirects can happen. Redirects are nothing other than mini responses sent back to the client telling it the actual location of the source. When using a Web browser, this is almost not noticable, since the browser usually follows redirects (i.e. automatically requests the new location). Again, this is different in a script based walk through a web application. One might get a redirect response when you are not expecting one and if you’re not careful, you’ve hammered a certain page many times only to realize that you have been getting redirect responses. What’s more, MarkUs generates redirects if it thinks that the user is not logged in. Forget to send the cookie you’ve gotten earlier and MarkUs will redirect you to the log-in page instead of giving you the response for a particular page or servicing a HTTP POST request.
Why not turn off authentication and sessions entirely?
Some of you might think that I could just turn off sessions and authentication as a whole, but this won’t work well either. First note that I’d have to change a fair bit of MarkUs code to get it working without any authentication. Second, with authentication turned off, there is no notion of different users anymore. All of a sudden all requests belong to one single user. Hence, we don’t want this for script based load testing of MarkUs either. We want to simulate hundreds of users logging in simultaneously and not one single user over and over again.
For script based load testing I’ll have to make cookies work and deal with redirects. For the time being a curl based solution seems most promising. This won’t yield any request timing data on the client side eventually, though. I could manually time requests in order to get a feel of how long MarkUs takes to serve them under load, but not sure if that’s a high priority. Once I can simulate some load, I’ll use request times in the logs to approximate request/response times. Wouldn’t I have to deal with cookies and redirects, Apache Bench could be viable, but it seems it doesn’t handle cookies and redirects very well. I’ll keep you posted as to how it goes.
As usual, please leave a comment if you have thoughts about this or I’m missing something very obvious and am just not doing it right 😉
I will be working on analyzing MarkUs’ performance this term. Here is what I have so far:
In order to look at MarkUs’ performance under various conditions I have the following lab setup comprising of a server machine and eight clients. MarkUs will be hosted on the server machine and its performance will be evaluated there. The eight clients will be used to “simulate” load on the server. Here is a picture illustrating the setup:
As illustrated, clients are connected via a network switch to the server. All machines are physical machines (i.e. not virtual machines). The server machine has 2 Gigabytes of RAM and is a Intel Core 2 machine. I.e. it has 2 cores at 1.86GHz clock frequency. It runs on Ubuntu LTS 10.04 and the PostgreSQL database runs on the same machine as well. All filesystems are local and are ext4 formatted. The specs of the client machines should be irrelevant for this analysis.
In order to get some data out of this setup I intend to proceed this analysis incrementally. One focus of this study is to look at MarkUs’ performance when there are 500+ students in a course. In fact, I hope this analysis will help us to come up with some recommendations in terms of the maximum number of students which should not be exceeded per MarkUs instance.
As a first step I will be trying to reproduce a potential performance problem of MarkUs when there are 500+ students in a course and some subset of them is submitting code simultaneously. In order to do so I will use a mongrel cluster on the server behind an Apache httpd reverse proxy (which is used at the University of Toronto and University of Waterloo). In order to simulate 500+ students submissions, I will be using benchmarking scripts in lib/tools of the MarkUs code base. Since MarkUs on Rails 3 is not quite ready yet, I will be using MarkUs 0.10.1 on Rails 2.
Once I was able to reproduce this performance problem, I will start to take a closer look at what might cause this. Questions I’d like to answer are:
- Is there a reproducible performance problem?
- When does performance start to degrade? At 500 students? 5000 students? Something else?
- How does a clustered mongrel setup compare to a setup using Phusion Passenger with respect to performance? Can we get some hard numbers on this?
- What is the cause of the performance breakdown (if any)? Is it one cause? A combination of many things?
- Is there a difference between students working in groups vs. students working alone setups?
- How can we alleviate potential performance problems?
Later throughout my analysis I’d also like to see if there is any difference in results when a DB snapshot of a former production system is being used.
Any other suggestions as to how to go about this? Am I missing something? What would you like to get out of this?
In order to figure out what the cause of potential performance problems of MarkUs is, I plan to use the following tools:
- Analyze response time in production logs with say Rails Analyzer in order to get some concrete numbers as to how response time behaves as the number of students and repositories increases.
- Use a Ruby profiler, for example Ruby Prof, in order to profile problematic code.
- Use a system profiler such as Oprofile in order to get a bigger picture as to where time is mostly spent system wide. In the database? File IO? MarkUs code (i.e. Ruby code)? Something else?
What else should I be using? Thoughts? I anticipate that it’s going to be challenging to go from the big picture – there is a performance problem – down to – what is causing it – since there are a lot of components involved (Subversion, PostgreSQL, Ruby, Rails, MarkUs).
What do you think? Let’s hear it 🙂