Rails: A New Scope

Rails has some magic. Sometimes that magic is very confusing. This is the story of one of those times where the magic is confusing.

ActiveRecord provides two different ways of defining scopes. First is by using the scope class method:

class Post < ActiveRecord::Base
  scope :published, -> {
    where('published_at < ?', Time.zone.now)
  }
end

The other way is by defining them as a class methods. Class methods on models are available as scopes… even if you didn’t mean to use them that way.

class Post < ActiveRecord::Base
  def self.published
    where('published_at < ?', Time.zone.now)
  end
end

Either of these allow you to use published as a scope, chained along all the other fun ActiveRecord methods: Post.where(author: jared).published.order(desc: :published_at).

The Phantom Magic

Rails uses some fun magic to accomplish this. It keeps track of the current query, and the query methods (like where and order) all use that current query. Rails actually just calls the class methods.

class Post < ActiveRecord::Base
  def self.published
    where('published_at < ?', Time.zone.now)
  end

  def self.three
    3
  end

  def self.what_is_self
    self
  end
end

Post.published.to_sql
#=> "SELECT \"posts\".* FROM \"posts\" WHERE (published_at < '2018-10-06 21:30:54.737566')"

Post.published.three
#=> 3

Post.published.what_is_self
#=> Post(id: integer, created_at: datetime, updated_at: datetime, published_at: datetime)

Rails isn’t doing anything magic to somehow define these methods as something special, it’s just calling them. As is normal inside a class method, self is the class itself and not a query object. It keeps track of the current query being built, and uses that as you chain. I believe it does this by storing the current query in a fiber-local variable, but I’ve not dug into the mechanics of it.

Revenge of the SQL

I admit that 99% of the time the underlying mechanics of scopes won’t matter to you in the slightest. None of this mattered to me until a Rails app I work on was getting hit with an unusual amount of traffic, and the most complex query in the app was pinning the database.

Let’s pretend the query looked something like this:

class Post < ActiveRecord::Base
  def self.really_complex_query(something)
    good_ids = Post.good_posts.pluck(:id)

    where(id: good_ids)
      .some_scope
      .another_scope
      .scope_that_takes_an_argument(something)
  end
end

Sure, this isn’t good code, but are you going to tell me all your production code is good?

The good_posts part is the problem. Here are some facts about it:

  • It’s complex enough that ActiveRecord breaks when trying to chain those other scopes off of it.
  • It’s the source of the performance issues.
  • The result of it doesn’t change often, and the consequences of stale values coming back from it are negligible.

Now a naive developer might look at the situation and say, “I should cache that.” And in an emergency, they might change the code to look like this:

class Post < ActiveRecord::Base
  def self.really_complex_query(something)
    good_ids = Rails.cache.fetch "good_post_ids", expires_in: 10.minutes do 
      Post.good_posts.pluck(:id)
    end

    where(id: good_ids)
      .some_scope
      .another_scope
      .scope_that_takes_an_argument(something)
  end
end

This would be a disaster. There’s probably some code somewhere in app that looks like this:

Post.where(author: jared).really_complex_query(something)

If that code hits an empty cache, then Post.good_posts.pluck(:id) will get called while we’re building a query that starts with Post.where(author: jared) and will only cache the good posts that belong to me. Posts.good_posts.pluck(:id) is no different than running good_posts.pluck(:id) and will build off the current query. The good_post_ids cache key will get populated with the result of running Posts.where(author: jared).good_posts.pluck(:id).

If a developer deployed that code, suddenly only my posts would be shown on the site, and that would be pretty frustrating for all the people trying to find posts that aren’t mine. And any other code that used our really_complex_query in a chain would run the risk of caching other invalid results.

Conclusion

I haven’t decided what the point of this article was.

by Jared Norman
25 September 2018

Subscribe to stay up to date on the latest programming techniques and get Super Good articles delivered straight to your inbox!