This is the first post in a series about common mistakes that lead to unreliable test suites in Ruby and how to fix them. Stay tuned for more.
Memoization is a helpful tool for optimization, but Rubyists use it for more than that. Take this class for, example:
class UserNotification
def initialize(user)
@user = user
end
def account_frozen
return unless user.phone_number
api_client.send(
user.phone_number,
"Your account has been frozen due to suspicious activity."
)
end
private
attr_reader :user
def api_client
@api_client ||= NotificationService::Client.new(
key: ENV.fetch('NOTIFICATION_SERVICE_API_KEY')
)
end
end
In the class above, the #api_client
private method is memoized so
that it instantiates our notification service client when it’s called
the first time and always returns the same client on subsequent calls.
Memoizing the setup for the API client isn’t strictly necessary. It probably isn’t very expensive to perform this tasks multiple times, so we could just remove the memoization altogether and reinstantiate the client on every call.
Alternatively we could just setup the API client in the constructor. We’d be setting it up even if we didn’t end up using it, but the penalty for doing so is insignificant.
I regularly see Rubyists memoize things when it isn’t strictly necessary, and I like it. In situations like this it keeps the constructor clean, isolating the boring implementation details in a private method at the very bottom of the class definition. A good class tells a story about how to use it, and this class tells you up front about what it does, and allows you to keep reading if you need to know how it does it.
There’s very few situations where you’ll consider using memoization and be wrong to do so. Memoization is reasonably sensible as long as these two criteria are met:
If your instinct is to compute some value lazily in your class, it’s probably just fine to do that. Some languages lazy evaluate pretty much everything. Just remember that memoizing a falsy value won’t do anything; it will be re-evaluated every time. (You can get around this.)
Forgetting about “the life of the object” is how we introduce the first test suite issue. Like everything in Ruby, classes are objects too. Class methods are no different than methods on any other instance. When you define a class method it looks like this:
class Example
def self.foo
3
end
# or equivalently:
class << self
def foo
3
end
end
end
When it comes to memoization the important difference between class and instance methods is how long the object in question lives. Your application will create and throw away thousands or millions of instances of most classes over the course of one HTTP request/response cycle. On the other hand, the classes themselves live for the length of the Ruby process. This means memoization in class methods saves the memoized value until your app is restarted.
Memoizing class methods is sometimes okay. Ignoring potential thread safety issues (a topic for another day), sometimes you do want to save a computed value for the length of your Ruby process.
For example, if you build a lookup table for tax rates from a CSV then there’s no reason not to keep it around. You probably don’t want to load it from disk and parse it every time you need to look up a tax rate, and if you’re loading this data from a CSV then it’s probably safe to assume that it doesn’t change very often.
Where you get into trouble is when you memoize something that does change. A common one I see is memoizing some value that changes infrequently from the database. Not only will this almost assuredly break your test suite, but it will probably cause bugs in your application. Take a look:
module UserQueries
class << self
def active_with_subscription
@active_and_subscribed ||=
User.active.joins(:subscription).distinct
end
end
end
It’s a innocent looking piece of code, and in my experience people often end up with something like this by refactoring the query out of another class.
The issue is that once this query gets executed the result gets saved forever (until the Ruby process terminates.) The first time this method gets called in your test suite or application process the value will be recorded and returned for all subsequent calls, no matter how much the database changes. Consider this example RSpec spec:
require 'rails_helper'
RSpec.describe UserQueries do
describe ".active_with_subscription" do
subject { described_class.active_with_subscription }
context "when there are active users with subscriptions" do
let!(:user_one) { create :user, :active, :with_subscription }
let!(:user_two) { create :user, :active, :with_subscription }
it { is_expected.to contain_exactly(user_one, user_two) }
end
context "when there are no active users with subscriptions" do
before do
create :user, :active
create :user, :inactive, :with_subscription
end
it { is_expected.to be_empty }
end
end
end
The example that runs second will always fail when run against the code above. The value returned by ActiveRecord will be saved and reused for the lifetime of the process. If the first example runs first then the method will return the two active users we created for that test when it gets called in the second example, even though they’ve likely been scrubbed by the database. If the second example runs first then the method will return an empty collection when the first example gets run.
You’ll likely never want to memoize the result of a query for the life of a process. Databases contain application state, and the point of state is that it changes.
This is one of the easiest common mistakes to fix. Once you’ve determined that you’ve got some memoization where you shouldn’t, you simply remove the memoization.
# Before:
def active_with_subscription
@active_and_subscribed ||=
User.active.joins(:subscription).distinct
end
# After:
def active_with_subscription
User.active.joins(:subscription).distinct
end
Careful though, you’ll need to performance test your code after you make a change like this. If you have some code that calls this a thousand times in one request then you’re going to need to find a way to cache the result at another level.
Memoization is a great technique for writing more readable and faster code. Don’t hesitate to use it, but pay attention while refactoring and don’t pull memoized logic out into the wrong context.