MarkUs Blog

MarkUs Developers Blog About Their Project

Archive for August, 2014

Say Goodbye to the n+1 Query Problem

without comments

With the declarative power of Rails’ ActiveRecord, it’s very easy to write code that are prone to performance bottleneck, such as an issue commonly known as the n+1 query problem.

Consider the following simplified MarkUs models:

class Grouping < ActiveRecord::Base
  has_many :ta_memberships
  has_many :tas, through: :ta_memberships
end

class TaMembership < Membership
  belongs_to :ta
  belongs_to :grouping
end

class Ta < User
  has_many :ta_memberships
  has_many :groupings, through: :ta_memberships
end

TaMembership is a join model between Ta and Grouping that represents the assignment of a TA to a grouping so that the TA has the permission to mark students in the grouping.

A view may have a table of groupings and need to display all the TAs assigned to each grouping (in ERB), as in this (overly-simplified) example:

<% Grouping.find_each do |grouping| %>
  <span class='ta'><%= grouping.tas %></span>
<% end %>

This all looks good until we inspect the query log and find that such a simply snippet of code generated a lot of SQL queries:

SELECT * FROM groupings

SELECT * FROM tas JOIN ta_memberships ON ta_memberships.ta_id = tas.id WHERE ta_memberships.grouping_id = 1

SELECT * FROM tas JOIN ta_memberships ON ta_memberships.ta_id = tas.id WHERE ta_memberships.grouping_id = 2

SELECT * FROM tas JOIN ta_memberships ON ta_memberships.ta_id = tas.id WHERE ta_memberships.grouping_id = 3

SELECT * FROM tas JOIN ta_memberships ON ta_memberships.ta_id = tas.id WHERE ta_memberships.grouping_id = 4

...

SELECT * FROM tas JOIN ta_memberships ON ta_memberships.ta_id = tas.id WHERE ta_memberships.grouping_id = 42

This is because ActiveRecord lazily loads associations by default, i.e., the association model is only loaded (through a SQL query to the database) when the attribute is accessed the first time. In the above example, an initial query is issued to get a list of Grouping models. Then, for each Grouping instance, the tas attribute is accessed, generating another SQL query to get the associated model instance. This results in a total of n+1 queries, where n is the number of groupings, hence the name the problem. In a networked production environment, the round-trip cost of issuing a database query is a significant overhead due to network delays. Therefore, in general, having n queries perform poorly compared to having only one (or some constant number of) queries that achieve the same.

In this case, the performance can be improved by avoiding the n+1 queries and use only a few. In Rails, this can be achieved using eager loading of associations.

<% Grouping.includes(:tas).find_each do |grouping| %>
  <span class='ta'><%= grouping.tas %></span>
<% end %>

By using includes, Rails takes care of the eager loading of the tas association and issues only two queries:

SELECT * FROM groupings

SELECT * FROM tas JOIN ta_memberships ON ta_memberships.ta_id = tas.id WHERE ta_memberships.grouping_id in (1, 2, 3, 4, ..., 42)

Just like the where and joins method, The includes method can handle nested associations as well:

Grouping.includes(ta_memberships: [{ ta: :groupings }, :grouping])

Note that while the nested association can be loaded, sometimes it’s redundant and causes too much overhead when the nesting level gets too deep. In the above snippet, grouping.ta_memberships is a collection of TaMembership instances, where each instance has a ta association, and each one has a collection of Grouping instances. All these eagerly-loaded instances already form a pretty large and complex structure — Large structure causes memory bottlenecks. Think about whether there is any redundancy in the eagerly loaded model instances and whether you can re-organize the view or controller to simplify the structure.

Normally the use cases of includes are in the controllers, in which the model instances are eagerly loaded and passed to the view (the above examples where the includes calls are in the view are only for demonstration purposes). However, when multiple controllers are using the same set of eagerly loaded associations, consider writing a scope for the model (in Rails 4):

class Grouping
  scope :with_tas, -> { includes(:tas) }
end

Or the Rails 3 way:

class Grouping
  def self.with_tas
    Grouping.includes(:tas)
  end
end

MarkUs still has quite a few instances of the n+1 query problem at the time of writing. With the help of bullet, we can track the the remaining evil n+1 queries in the system.

Bulk operation — “n query problem”

The n+1 query problem occurs most likely when reading the database (i.e., doing SQL SELECT statements). A related problem, I call it “the n query problem” may occur when doing bulk operations such as INSERT, UPDATE and DELETE statements.

For example, the problem occurs when doing something like the following:

groupings.each do |grouping|
  grouping.some_attribute = foo
  grouping.tas.clear
  grouping.save!
end

This generates n UPDATE statements and n INSERT statements. These can normally be reduced to just one single query.

Bulk creation

Use activerecord-import. This is by far the most DBMS-independent gem for ActiveRecord bulk creation. ActiveRecord’s create method actually supports bulk creation, but it depends on the underlying database driver for Ruby to do the actual bulk creation. At the time of writing, the PostgreSQL driver pg still doesn’t support bulk creation.

For a sample usage, refer to Grouping.assign_tas.

Bulk update and bulk deletion

Use update_all and delete_all.

For a sample usage, refer to Grouping.unassign_tas.

Written by Su Zhang

August 20th, 2014 at 5:58 pm

Posted in Uncategorized

Rails 4 Upgrade Notes

without comments

The Upgrading to Rails 4 guide is a good first step to understanding what needs to be changed. Rails’ own Upgrading Guide also has a list of things to pay attention to. Neither guide is a superset of the other, and I have had to look elsewhere to solve a few of the upgrading issues.

Here is a short list of the upgrading tasks that I could not back port which caused varying amounts of trouble:

 


 

Gemfile

You will want to look more closely at the versions that the first guide suggests. In most cases a newer maintenance release of a gem are available and should be selected instead.

One exception to this is the sass-rails gem. It will need to be set to version 4.0.1. Why? Because MarkUs needs a newer version of Sass. Sass 3.3.x is required because Sass 3.2.x fails to parse MarkUs scss code. Though it does not appear to make sense, the older sass-rails gem had a looser version specification for sass, and so by rolling back sass-rails to 4.0.1 we allow bundler to choose a newer version of sass itself.

When removing strong_parameters from the Gemfile, do not add the protected_attributes gem for backwards compatibility. MarkUs has already been upgraded to use strong parameters. Since strong_parameters is part of Rails 4, it should not be listed in the Gemfile.

Lastly, the minitest gem should be removed. Ralis 4 depends on minitest itself; it is best to avoid conflicting version requirements.

 


 

Routes

The catch-all route should get changed to work via :all HTTP verbs if possible:

https://github.com/MarkUsProject/Markus/blob/d0d76012350901cf4059d012fc4da575881ca876/config/routes.rb#L338

 


 

Configuration

In config/application.rb:20 “config.action_view.javascript_expansions” should be deleted. And then you will need to update the the “javascript_include_tag :defaults” line in  app/views/layouts/no_menu_header.html.erb to use the ‘application’ (with quotes) instead of :defaults.

Written by Mark Rada

August 14th, 2014 at 6:58 am

Posted in Uncategorized

Using Gitolite

with one comment

We are in the process of implementing a git backend for MarkUs, and authentication/authorization on the repos is a big question.

It seems obvious that gitolite is the right choice for authorization, and because I’m planning to use it for courses as well, I thought I would write down what I have learned.  I still have to try it out, so I may have some of the details wrong.

The repos will be managed under a separate userid that I’ll call markus-git.  I suspect it should be a separate userid than the one that the application runs under, for safety. Only the markus-git user has file permissions on the repos, and users do not need any shell access.

I should also point out that if students are only accessing MarkUs through the web interface, then none of the rest of this is relevant to them.

Users will have to register a public key with gitolite, and we’ll want to automate this process so that it doesn’t require human intervention.  Then they can make a request like the following.

git clone markus-git@markus.cdf.toronto.edu:c2reidka.git

How does it work?

Gitolite sets things up so that when the ssh command is run, it first forces an authorization check. It uses the public key to create a message that only the private key on the client can interpret. If the client sends back the correct response, then the user will be granted access to the appropriate repos.

The gitolite-admin repo is cloned by the administrators (and will include the markus process). It contains the public keys as well as a configuration file conf/gitolite.conf that contains all of the access control information for the repos.

The nice thing about using the ssh approach is that there is no additional server to worry about.

One of the headaches we will have to resolve is how to handle students who have problems with their key pairs, and need to update their keys.

Written by Karen Reid

August 1st, 2014 at 12:42 pm

Posted in Uncategorized