This summer, one of my more interesting tasks was to speed up MarkUs, namely to hasten the loading of the dreaded submissions view. Going into this, I had a semi-decent understanding of rails, and very little knowledge of databases. I had started by looking at ticket #174 about indexing foreign keys. Although most of the indexing has already been done, I learned about a certain ‘N+1 query’ problem that is common in many applications using databases.
A short description of the problem:
Picture a ‘student’ object having N ‘lecture’ objects that they attend daily. Our application keeps track of all the students, and stores them in a database. Say we want to change the time of each ‘lecture’ for a given ‘student’. Our application logic would loop over student.lectures and perform an action on each ‘lecture’. The app performs 1 database query to select the ‘student’, and N more database queries, one to select each ‘lecture’. This is the essence of the problem, as database queries are expensive.
In the case of MarkUs, the submissions table had hundreds of rows. Each row represented a grouping, and in order to populate the table (each column – group name, final mark, grace credits, etc… – required more information about the grouping), each row needed about 7 extra queries. With database queries in the thousands, it could take over a minute for the page to load.
The solution for this is Rails’ eager loading. When you know that with each grouping you will need info about their submissions, results, members, etc… you simply tell Rails to include all that information in a single large query. Eager loading in rails is done by adding the :include option option to any ‘find’ method calls. Simply by preloading all the required information for the submissions view, I was able to speed up the loading of the page by several times. Of course redundant eager loading can actually slow down the application, but a discussion about eager loading is not the point of this post.
Finding N+1 Queries in Code:
In ticket #174 Mike Gunderloy suggests using a gem called bullet to find the queries. It is very simple to install, and gives feedback right in the console running the server. The gem notices when you have N+1 queries, gives you a specific :include statement and tells you where to place it in the code. I began using this gem and it did work a few times, however in many cases it simply led me in circles, first saying to add an :include somewhere to get rid of N+1 queries, and then saying I am using redundant eager loading in the same place.
Bullet was something I used at first to get me started, however I discontinued using it because the console already displayed to me each SQL query that MarkUs was doing, and I found this to be helpful enough. N+1 queries usually happen on pages which are iterative in nature (large tables for example), but are not limited to them. A good way to detect them is to look at the console, and see if there are a lot of very similar database queries that differ only by the id. For example, 200
Grouping Load (1.5ms) SELECT * FROM “groupings” WHERE (“groupings”.”id” = some number)
would suggest that it is possible to load all the groupings with one query (usually by adding an :include somewhere).
Apart from N+1 queries, there were some other issues which slowed down the application, but most of them had to do with excessive database queries. For example, in the submissions view, each grouping’s total mark would be re-calculated every time the page was loaded (the calculation involved several queries per grouping). This was a case where it was much better to simply add a ‘total_mark’ column in the groupings database table, which would be updated every time a mark is changed (thanks to Rails’ wonderful before_save callback).
I have discussed some of the more significant issues I have encountered, however these are not the only ones that will happen. When tweaking our code to optimize it, I was always aiming for as little database queries as possible. I found that in many cases, it is more desirable to have a bit of extra processing done if it involved reducing the number of database queries.