Archive for August, 2014
With the declarative power of Rails’ ActiveRecord, it’s very easy to write code that are prone to performance bottleneck, such as an issue commonly known as the n+1 query problem.
Consider the following simplified MarkUs models:
TaMembership is a join model between Ta and Grouping that represents the assignment of a TA to a grouping so that the TA has the permission to mark students in the grouping.
A view may have a table of groupings and need to display all the TAs assigned to each grouping (in ERB), as in this (overly-simplified) example:
This all looks good until we inspect the query log and find that such a simply snippet of code generated a lot of SQL queries:
This is because ActiveRecord lazily loads associations by default, i.e., the association model is only loaded (through a SQL query to the database) when the attribute is accessed the first time. In the above example, an initial query is issued to get a list of Grouping models. Then, for each Grouping instance, the
tas attribute is accessed, generating another SQL query to get the associated model instance. This results in a total of n+1 queries, where n is the number of groupings, hence the name the problem. In a networked production environment, the round-trip cost of issuing a database query is a significant overhead due to network delays. Therefore, in general, having n queries perform poorly compared to having only one (or some constant number of) queries that achieve the same.
In this case, the performance can be improved by avoiding the n+1 queries and use only a few. In Rails, this can be achieved using eager loading of associations.
By using includes, Rails takes care of the eager loading of the
tas association and issues only two queries:
Just like the
joins method, The
includes method can handle nested associations as well:
Note that while the nested association can be loaded, sometimes it’s redundant and causes too much overhead when the nesting level gets too deep. In the above snippet,
grouping.ta_memberships is a collection of
TaMembership instances, where each instance has a
ta association, and each one has a collection of
Grouping instances. All these eagerly-loaded instances already form a pretty large and complex structure — Large structure causes memory bottlenecks. Think about whether there is any redundancy in the eagerly loaded model instances and whether you can re-organize the view or controller to simplify the structure.
Normally the use cases of
includes are in the controllers, in which the model instances are eagerly loaded and passed to the view (the above examples where the
includes calls are in the view are only for demonstration purposes). However, when multiple controllers are using the same set of eagerly loaded associations, consider writing a scope for the model (in Rails 4):
Or the Rails 3 way:
MarkUs still has quite a few instances of the n+1 query problem at the time of writing. With the help of bullet, we can track the the remaining evil n+1 queries in the system.
Bulk operation — “n query problem”
The n+1 query problem occurs most likely when reading the database (i.e., doing SQL
SELECT statements). A related problem, I call it “the n query problem” may occur when doing bulk operations such as
For example, the problem occurs when doing something like the following:
This generates n
UPDATE statements and n
INSERT statements. These can normally be reduced to just one single query.
Use activerecord-import. This is by far the most DBMS-independent gem for ActiveRecord bulk creation. ActiveRecord’s
create method actually supports bulk creation, but it depends on the underlying database driver for Ruby to do the actual bulk creation. At the time of writing, the PostgreSQL driver pg still doesn’t support bulk creation.
For a sample usage, refer to Grouping.assign_tas.
Bulk update and bulk deletion
For a sample usage, refer to Grouping.unassign_tas.
The Upgrading to Rails 4 guide is a good first step to understanding what needs to be changed. Rails’ own Upgrading Guide also has a list of things to pay attention to. Neither guide is a superset of the other, and I have had to look elsewhere to solve a few of the upgrading issues.
Here is a short list of the upgrading tasks that I could not back port which caused varying amounts of trouble:
You will want to look more closely at the versions that the first guide suggests. In most cases a newer maintenance release of a gem are available and should be selected instead.
One exception to this is the sass-rails gem. It will need to be set to version 4.0.1. Why? Because MarkUs needs a newer version of Sass. Sass 3.3.x is required because Sass 3.2.x fails to parse MarkUs scss code. Though it does not appear to make sense, the older sass-rails gem had a looser version specification for sass, and so by rolling back sass-rails to 4.0.1 we allow bundler to choose a newer version of sass itself.
When removing strong_parameters from the Gemfile, do not add the protected_attributes gem for backwards compatibility. MarkUs has already been upgraded to use strong parameters. Since strong_parameters is part of Rails 4, it should not be listed in the Gemfile.
Lastly, the minitest gem should be removed. Ralis 4 depends on minitest itself; it is best to avoid conflicting version requirements.
The catch-all route should get changed to work via :all HTTP verbs if possible:
We are in the process of implementing a git backend for MarkUs, and authentication/authorization on the repos is a big question.
It seems obvious that gitolite is the right choice for authorization, and because I’m planning to use it for courses as well, I thought I would write down what I have learned. I still have to try it out, so I may have some of the details wrong.
The repos will be managed under a separate userid that I’ll call markus-git. I suspect it should be a separate userid than the one that the application runs under, for safety. Only the markus-git user has file permissions on the repos, and users do not need any shell access.
I should also point out that if students are only accessing MarkUs through the web interface, then none of the rest of this is relevant to them.
Users will have to register a public key with gitolite, and we’ll want to automate this process so that it doesn’t require human intervention. Then they can make a request like the following.
git clone email@example.com:c2reidka.git
How does it work?
Gitolite sets things up so that when the ssh command is run, it first forces an authorization check. It uses the public key to create a message that only the private key on the client can interpret. If the client sends back the correct response, then the user will be granted access to the appropriate repos.
gitolite-admin repo is cloned by the administrators (and will include the markus process). It contains the public keys as well as a configuration file
conf/gitolite.conf that contains all of the access control information for the repos.
The nice thing about using the ssh approach is that there is no additional server to worry about.
One of the headaches we will have to resolve is how to handle students who have problems with their key pairs, and need to update their keys.