MarkUs Blog

MarkUs Developers Blog About Their Project

Say Goodbye to the n+1 Query Problem

without comments

With the declarative power of Rails’ ActiveRecord, it’s very easy to write code that are prone to¬†performance bottleneck, such as an issue commonly known as the n+1 query problem.

Consider the following simplified MarkUs models:

class Grouping < ActiveRecord::Base
  has_many :ta_memberships
  has_many :tas, through: :ta_memberships
end

class TaMembership < Membership
  belongs_to :ta
  belongs_to :grouping
end

class Ta < User
  has_many :ta_memberships
  has_many :groupings, through: :ta_memberships
end

TaMembership is a join model between Ta and Grouping that represents the assignment of a TA to a grouping so that the TA has the permission to mark students in the grouping.

A view may have a table of groupings and need to display all the TAs assigned to each grouping (in ERB), as in this (overly-simplified) example:

<% Grouping.find_each do |grouping| %>
  <span class='ta'><%= grouping.tas %></span>
<% end %>

This all looks good until we inspect the query log and find that such a simply snippet of code generated a lot of SQL queries:

SELECT * FROM groupings

SELECT * FROM tas JOIN ta_memberships ON ta_memberships.ta_id = tas.id WHERE ta_memberships.grouping_id = 1

SELECT * FROM tas JOIN ta_memberships ON ta_memberships.ta_id = tas.id WHERE ta_memberships.grouping_id = 2

SELECT * FROM tas JOIN ta_memberships ON ta_memberships.ta_id = tas.id WHERE ta_memberships.grouping_id = 3

SELECT * FROM tas JOIN ta_memberships ON ta_memberships.ta_id = tas.id WHERE ta_memberships.grouping_id = 4

...

SELECT * FROM tas JOIN ta_memberships ON ta_memberships.ta_id = tas.id WHERE ta_memberships.grouping_id = 42

This is because ActiveRecord lazily loads associations by default, i.e., the association model is only loaded (through a SQL query to the database) when the attribute is accessed the first time. In the above example, an initial query is issued to get a list of Grouping models. Then, for each Grouping instance, the tas attribute is accessed, generating another SQL query to get the associated model instance. This results in a total of n+1 queries, where n is the number of groupings, hence the name the problem. In a networked production environment, the round-trip cost of issuing a database query is a significant overhead due to network delays. Therefore, in general, having n queries perform poorly compared to having only one (or some constant number of) queries that achieve the same.

In this case, the performance can be improved by avoiding the n+1 queries and use only a few. In Rails, this can be achieved using eager loading of associations.

<% Grouping.includes(:tas).find_each do |grouping| %>
  <span class='ta'><%= grouping.tas %></span>
<% end %>

By using includes, Rails takes care of the eager loading of the tas association and issues only two queries:

SELECT * FROM groupings

SELECT * FROM tas JOIN ta_memberships ON ta_memberships.ta_id = tas.id WHERE ta_memberships.grouping_id in (1, 2, 3, 4, ..., 42)

Just like the where and joins method, The includes method can handle nested associations as well:

Grouping.includes(ta_memberships: [{ ta: :groupings }, :grouping])

Note that while the nested association can be loaded, sometimes it’s redundant and causes too much overhead when the nesting level gets too deep. In the above snippet, grouping.ta_memberships is a collection of TaMembership instances, where each instance has a ta association, and each one has a collection of Grouping instances. All these eagerly-loaded instances already form a pretty large and complex structure — Large structure causes memory bottlenecks. Think about whether there is any redundancy in the eagerly loaded model instances and whether you can re-organize the view or controller to simplify the structure.

Normally the use cases of includes are in the controllers, in which the model instances are eagerly loaded and passed to the view (the above examples where the includes calls are in the view are only for demonstration purposes). However, when multiple controllers are using the same set of eagerly loaded associations, consider writing a scope for the model (in Rails 4):

class Grouping
  scope :with_tas, -> { includes(:tas) }
end

Or the Rails 3 way:

class Grouping
  def self.with_tas
    Grouping.includes(:tas)
  end
end

MarkUs still has quite a few instances of the n+1 query problem at the time of writing. With the help of bullet, we can track the the remaining evil n+1 queries in the system.

Bulk operation — “n query problem”

The n+1 query problem occurs most likely when reading the database (i.e., doing SQL SELECT statements). A related problem, I call it “the n query problem” may occur when doing bulk operations such as INSERT, UPDATE and DELETE statements.

For example, the problem occurs when doing something like the following:

groupings.each do |grouping|
  grouping.some_attribute = foo
  grouping.tas.clear
  grouping.save!
end

This generates n UPDATE statements and n INSERT statements. These can normally be reduced to just one single query.

Bulk creation

Use activerecord-import. This is by far the most DBMS-independent gem for ActiveRecord bulk creation. ActiveRecord’s create method actually supports bulk creation, but it depends on the underlying database driver for Ruby to do the actual bulk creation. At the time of writing, the PostgreSQL driver pg still doesn’t support bulk creation.

For a sample usage, refer to Grouping.assign_tas.

Bulk update and bulk deletion

Use update_all and delete_all.

For a sample usage, refer to Grouping.unassign_tas.

Written by Su Zhang

August 20th, 2014 at 5:58 pm

Leave a Reply