Rails has some magic. Sometimes that magic is very confusing. This is the story of one of those times where the magic is confusing.
ActiveRecord provides two different ways of defining scopes. First is by using the scope
class method:
class Post < ActiveRecord::Base
scope :published, -> {
where('published_at < ?', Time.zone.now)
}
end
The other way is by defining them as a class methods. Class methods on models are available as scopes… even if you didn’t mean to use them that way.
class Post < ActiveRecord::Base
def self.published
where('published_at < ?', Time.zone.now)
end
end
Either of these allow you to use published
as a scope, chained along all the other fun ActiveRecord methods: Post.where(author: jared).published.order(desc: :published_at)
.
Rails uses some fun magic to accomplish this. It keeps track of the current query, and the query methods (like where
and order
) all use that current query. Rails actually just calls the class methods.
class Post < ActiveRecord::Base
def self.published
where('published_at < ?', Time.zone.now)
end
def self.three
3
end
def self.what_is_self
self
end
end
Post.published.to_sql
#=> "SELECT \"posts\".* FROM \"posts\" WHERE (published_at < '2018-10-06 21:30:54.737566')"
Post.published.three
#=> 3
Post.published.what_is_self
#=> Post(id: integer, created_at: datetime, updated_at: datetime, published_at: datetime)
Rails isn’t doing anything magic to somehow define these methods as something special, it’s just calling them. As is normal inside a class method, self
is the class itself and not a query object. It keeps track of the current query being built, and uses that as you chain. I believe it does this by storing the current query in a fiber-local variable, but I’ve not dug into the mechanics of it.
I admit that 99% of the time the underlying mechanics of scopes won’t matter to you in the slightest. None of this mattered to me until a Rails app I work on was getting hit with an unusual amount of traffic, and the most complex query in the app was pinning the database.
Let’s pretend the query looked something like this:
class Post < ActiveRecord::Base
def self.really_complex_query(something)
good_ids = Post.good_posts.pluck(:id)
where(id: good_ids)
.some_scope
.another_scope
.scope_that_takes_an_argument(something)
end
end
Sure, this isn’t good code, but are you going to tell me all your production code is good?
The good_posts
part is the problem. Here are some facts about it:
Now a naive developer might look at the situation and say, “I should cache that.” And in an emergency, they might change the code to look like this:
class Post < ActiveRecord::Base
def self.really_complex_query(something)
good_ids = Rails.cache.fetch "good_post_ids", expires_in: 10.minutes do
Post.good_posts.pluck(:id)
end
where(id: good_ids)
.some_scope
.another_scope
.scope_that_takes_an_argument(something)
end
end
This would be a disaster. There’s probably some code somewhere in app that looks like this:
Post.where(author: jared).really_complex_query(something)
If that code hits an empty cache, then Post.good_posts.pluck(:id)
will get called while we’re building a query that starts with Post.where(author: jared)
and will only cache the good posts that belong to me. Posts.good_posts.pluck(:id)
is no different than running good_posts.pluck(:id)
and will build off the current query. The good_post_ids
cache key will get populated with the result of running Posts.where(author: jared).good_posts.pluck(:id)
.
If a developer deployed that code, suddenly only my posts would be shown on the site, and that would be pretty frustrating for all the people trying to find posts that aren’t mine. And any other code that used our really_complex_query
in a chain would run the risk of caching other invalid results.
I haven’t decided what the point of this article was.