At Super Good, we work on Solidus stores of all types: from stores with custom third-party API integrations to subscriptions to custom PDF generation systems. One common thing across all client stores is the desire to provide customers with a stable and fast experience at an operational cost that fits the budget. Improving performance and stability is often a balance of carefully optimizing code and scaling up your production infrastructure, both of which can be costly and timely endeavours. This post aims to discover if improving performance is possible without changing any code or adding extra costs to our production infrastructure. We will test the
jemalloc project on a production Solidus store to see if we can improve some memory usage issues without negatively impacting response time.
As mentioned in the introduction, we’re trying to improve our store’s memory usage without incurring the costs of code changes or extra production infrastructure. We must also verify that
jemalloc has not negatively impacted application performance as measured by HTTP response time. If our application is frequently running out of memory, it may be relying on slower swap storage which will negatively impact application performance.
Before understanding how
jemalloc can help us reduce memory usage, we need some quick background on what a memory allocator actually does and what role it plays for Ruby apps. A memory allocator is responsible for implementing the POSIX ISO C
malloc family of functions. The POSIX specification is available here. More specifically,
malloc provides dynamic memory to processes like Ruby at runtime. Whenever we instantiate a new object, the running Ruby process calls
malloc to allocate some memory for your shiny new object.
All Unix-based operating systems include some implementation of
malloc. The GNU implementation of
malloc is the most common one included with Linux distributions, but there are many others. (For example, Alpine Linux ships with
jemalloc is the
malloc implementation born inside the FreeBSD operating system. It’s been the default FreeBSD memory allocator since 2005 and has spread to many other operating systems, including most Linux distributions and macOS.
Every Solidus store must evaluate this risk on a case-by-case basis. If your Solidus store has existing stability issues unrelated to memory usage, it may be worth diagnosing those first. It’s also worth verifying we have some error reporting in place, in case anything goes wrong. At a high level, we can evaluate the risk by analyzing how much C code is present in our Ruby app. Most Solidus stores will include some C extensions for the
mysql gems, but these gems are known to work well with
jemalloc. If your Solidus app has no custom C extensions, it’s reasonably safe to try
jemalloc locally or in a staging environment.
Let’s give this a try!
Start by installing
jemalloc with your package manager of choice (e.g
brew install jemalloc on Mac OS).
--with-jemalloc flag to the Ruby installer of your choice. We include a snippet for
# Works with ruby-install or ruby-build
ruby-install --install-dir=~/.rubies/ruby-3.1.3-jemalloc ruby-3.1.3 -- --with-jemalloc
ruby-build --install-dir=~/.rubies/ruby-3.1.3-jemalloc ruby-3.1.3 -- --with-jemalloc
RUBY_CONFIGURE_OPTS="--with-jemalloc=/usr/local/opt/jemalloc" asdf install ruby 3.1.3
asdf users on Apple silicon Macs report some additional steps are required.
jemalloc is set to true.
fly deploy to deploy your changes.
Today we’re looking at a Solidus 3.1.6 app running on Unicorn, Ruby 2.7.8, and Heroku Professional dynos. We are trying to address the repeated memory usage errors (R14) being reported by Heroku metrics without negatively impacting response time or throughput. If we cannot reduce memory usage, we may need to consider paying more to upgrade our dynos memory quota, which is currently
2,560MB. This app is running a web and worker dyno, but we focus on the web dyno running the Unicorn server. To enable
jemalloc on this app, we add the jemalloc buildpack, then set the environment variable as described above. This change will impact the web and worker dyno, so be sure to check on both dynos.
We start by reviewing the Heroku Metrics for memory usage on our web dyno. For this analysis, we use a 7-day window on the metrics data to provide a meaningful average over the business week.
Our average memory usage is still under the quota of
88.5%. However, our maximum memory usage is far over the quota at
118.1%. Each time our application runs into one of these R14 errors, our application uses much slower swap memory, so we want to eliminate these R14 errors.
jemalloc, we can see an instant improvement in our memory usage. We are still quite close to the quota at
99.1%, but staying under our quota is enough to eliminate the R14 errors and avoid relying on slower swap memory.
|Average Memory Usage
|Maximum Memory Usage
This looks promising! Next, let’s make sure we’re moving in the right direction with response time.
Like the memory usage analysis, we use Heroku metrics over a 7-day period to analyze response time.
While the median 50th percentile response time, minimum 95th percentile, and maximum 95th percentile look okay, we can see some large peaks on the graph, so much so that our legend on the y-axis goes all the way to 40 seconds! We can see one large peak on the right-hand side that clocks in at 30 seconds response time. However, this result is deceptive, as we can easily correlate this with a Heroku incident.
jemalloc had a negligible impact on response time for this particular app. That being said, getting this app under the memory usage quota without negatively impacting response time is still a win!
The following table summarizes our findings:
|Average Memory Usage
|Maximum Memory Usage
If your application frequently runs out of memory, try
jemalloc. As Nate Berkopec said, “There are no free lunches. Though this one might be close to free. Like a ten cent lunch”. The improvements seen in this post are relatively minor, but overall we managed to stop our application from running out of memory and improved performance in the process.