How to Skip Bundle Install When Deploying a Rails App to Docker if the Gemfile Hasn’t Changed

With Docker, you can deploy a Rails app to a container that has all of the app’s dependencies (the right version of Ruby, your gems, etc.) embedded in it. You can fully test the app in the container, then ship the container to your production host(s) when you are ready. It’s like a VM only much lighter weight because it doesn’t have to reserve memory in advance.

Squirrel

I won’t go into the details of how to create a container in this post. But the short version is: you create a Dockerfile, which is a script that sets up the container, and then you run docker build to run that script.

Docker has an automatic caching mechanism to greatly speed things up after the first build of a Dockerfile. Each step (each line of the file) is cached separately. If you change line 6 of a 10-line Dockerfile and build it again, lines 1-5 will be skipped. Docker will just pull the results out of the cache. Nice. You can skip really slow steps like compiling Ruby.

But if you want to start using Docker with a Rails app, you will quickly notice a problem: you can’t cache the bundle install step. Any time you rebuild your image—even when the gems haven’t changed—you will have to sit and wait for Bundler to finish.

It’s annoying because once you have become accustomed to the huge speed boost you get from the Docker’s cache on other steps, you get pretty antsy waiting around for Bundler when you know perfectly well that you didn’t change the Gemfile.

If you have used Heroku, you know what I’m talking about. Every time you git push to Heroku it re-runs Bundler even when your Gemfile didn’t change. Other than asset compilation, it’s the slowest part of deploying to Heroku. (They don’t use Docker, but they do use the same underlying technology—Linux Containers—and when I use Docker I notice a lot of similar behavior to Heroku and it makes it more clear why Heroku made the architectural choices they did.)

So: why does Docker cache the other steps but not bundle install? Because before version 0.7.3, Docker doesn’t cache an ADD instruction or any instruction after it. (ADD copies a file or directory into the image from the build machine at build time.) And the usual way to add a Rails app to an image is to git pull the latest code and then copy it in with ADD.

It makes sense that Docker doesn’t cache ADDs. It’s pretty likely that you want the latest version of the thing you’re copying into the container. But it also introduces this problem.

Bundler depends on the Gemfile. The Gemfile is part of an ADDed directory (the Rails app), and the directory tree contains other frequently-modified files (e.g., source files). So Bundler has to run after you ADD the app, which means the bundle install step can’t be cached.

Well, There’s Good News

I mean, of course there is. Why else would I write this post in a blog about stuff I like?

I gave you a hint when I said “before version 0.7.3” above. Docker 0.7.3 was released a few days ago and it has a killer feature for Rails developers (the same feature should benefit developers of Python apps with requirements.txt, and might be a good alternative to this approach, which Nick Stinemates of Docker proposed).

The ADD command can now be cached.

If you ADD a directory tree, Docker (remarkably quickly, using a tar algorithm) generates a hash from the contents of all the files in it. If no file has changed, it will use the cached version of the same ADD instruction from a previous run of docker build.

This a a big deal. Now you can take advantage of the Docker cache to cache your bundle installs.

“But Brian,” you say, “that won’t help. It can’t use the cache if I deploy my app after changing source code in the directory tree.” Well…it can, with this one weird trick.

An Example

Let’s get started. First we’ll look at what a Dockerfile might look like for a Rails app. (Actually, you wouldn’t usually use SQLite in production and you wouldn’t usually put the database in the same container as the app. But that’s not important here.)

FROM ubuntu:12.10
MAINTAINER brian@morearty.org

# Install dependencies.
RUN apt-get update
RUN apt-get install -y curl git build-essential ruby1.9.3 libsqlite3-dev
RUN gem install rubygems-update --no-ri --no-rdoc
RUN update_rubygems
RUN gem install bundler sinatra --no-ri --no-rdoc

# Copy the app into the image.
ADD railsapp /opt/railsapp

# Now that the app is here, we can bundle.
WORKDIR /opt/railsapp
RUN bundle install

# Set up a default runtime command
CMD rails server thin

Let’s run docker build for the first time. (The Ubuntu image is already on my machine, so there’s no wait to pull it.) For my timing, I used a plain-vanilla Rails 4 app with the default gems.

$ time docker build .
Step 1 : FROM ubuntu:12.10
 ---> b750fe79269d
Step 2 : MAINTAINER brian@morearty.org
 ---> Running in 3479b6010856
 ---> 838c7b6022ab
Step 3 : RUN apt-get update
 ---> Running in b60b17f4385c
Ign http://archive.ubuntu.com quantal InRelease
Hit http://archive.ubuntu.com quantal Release.gpg

... etc., etc. ...

Step 10 : RUN bundle install
 ---> Running in 7a57242449d7
Fetching gem metadata from https://rubygems.org/.........
Fetching additional metadata from https://rubygems.org/..
Installing rake (10.1.1)
Installing i18n (0.6.9)
Installing minitest (4.7.5)
...
real    2m18.260s

Okay, two minutes 18 seconds for the initial build. Now we modify a source file (but not the Gemfile), then docker build again. I’m using Docker 0.7.3—the version that supports cached ADDs. But because one source file was changed, the entire app directory is considered to have been changed. So Docker will not use the cached version of the app. Since bundle install comes after the ADD and every step after an uncached step is also uncached, Docker will run it.

$ time docker build .
Uploading context 337.9 kB
Uploading context
Step 1 : FROM ubuntu:12.10
 ---> b750fe79269d
Step 2 : MAINTAINER brian@morearty.org
 ---> Using cache
 ---> 5895ed9e78a4

etc., etc. ...

Step 10 : RUN bundle install
 ---> Running in 3f0ddbeea83e
Fetching gem metadata from https://rubygems.org/.........
Fetching additional metadata from https://rubygems.org/..
Installing rake (10.1.1)
Installing i18n (0.6.9)
Installing minitest (4.7.5)
...
real    0m55.596s

55 seconds. Most of that time was spent in bundle install. That sucks. I didn’t change the Gemfile at all.

I Like Stuff that’s Cached

If I were using an older version of Docker, I would just have to put up with it. But watch me as I cleverly add a few lines to my Dockerfile to make it cache the bundle install. Pay special attention to lines 13-16:

FROM ubuntu:12.10
MAINTAINER brian@morearty.org

# Install dependencies.
RUN apt-get update
RUN apt-get install -y curl git build-essential ruby1.9.3 libsqlite3-dev
RUN gem install rubygems-update --no-ri --no-rdoc
RUN update_rubygems
RUN gem install bundler sinatra --no-ri --no-rdoc

# Copy the Gemfile and Gemfile.lock into the image. 
# Temporarily set the working directory to where they are. 
WORKDIR /tmp 
ADD railsapp/Gemfile Gemfile
ADD railsapp/Gemfile.lock Gemfile.lock
RUN bundle install 

# Everything up to here was cached. This includes
# the bundle install, unless the Gemfiles changed.
# Now copy the app into the image.
ADD railsapp /opt/railsapp

# Set the final working dir to the Rails app's location.
WORKDIR /opt/railsapp

# Set up a default runtime command
CMD rails server thin

owl

See what I did there? Before copying the whole app, I copied just the Gemfile and Gemfile.lock into the tmp directory and ran bundle install from there. If neither file changed, both ADD instructions are cached. Because they are cached, subsequent commands—like the bundle install one—remain eligible for using the cache.
Only after bundling do I copy the rest of the app into the image. You want to do this as late as possible since no later step can be cached. (I could have moved the CMD step farther up, too, but it’s so fast it didn’t matter.)

Let’s build it and see the resulting time saved. This time I’m pasting the entire output of docker build, so you can see that everything is cached. Observe line 33, which says that the bundle install command was cached:

Uploading context 337.9 kB
Uploading context
Step 1 : FROM ubuntu:12.10
 ---> b750fe79269d
Step 2 : MAINTAINER brian@morearty.org
 ---> Using cache
 ---> 5895ed9e78a4
Step 3 : RUN apt-get update
 ---> Using cache
 ---> d2898351463e
Step 4 : RUN apt-get install -y curl git build-essential ruby1.9.3 libsqlite3-dev
 ---> Using cache
 ---> aa1dbf3e6452
Step 5 : RUN gem install rubygems-update --no-ri --no-rdoc
 ---> Using cache
 ---> 8f4ef4bcfd32
Step 6 : RUN update_rubygems
 ---> Using cache
 ---> 358ef92178c7
Step 7 : RUN gem install bundler sinatra --no-ri --no-rdoc
 ---> Using cache
 ---> 9e7d9c0fd7de
Step 8 : WORKDIR /tmp
 ---> Using cache
 ---> b10a5c9f12c0
Step 9 : ADD railsapp/Gemfile Gemfile
 ---> Using cache
 ---> 79deb268175e
Step 10 : ADD railsapp/Gemfile.lock Gemfile.lock
 ---> Using cache
 ---> 1315e65cb616
Step 11 : RUN bundle install
 ---> Using cache
 ---> 6f067cbf6c2f
Step 12 : ADD railsapp /opt/railsapp
 ---> 655d668c338d
Step 13 : WORKDIR /opt/railsapp
 ---> Running in 0272330053b5
 ---> 94dda8e65416
Step 14 : CMD rails server thin
 ---> Running in 9afb1cee2bcf
 ---> 1429538cbdfb
Successfully built 1429538cbdfb

real    0m17.974s
user    0m0.000s
sys     0m0.020s

18 seconds. Not bad, compared to 55.

As one last test, let’s make sure docker will not use the cache if I change the Gemfile. I’m just going to touch it and then re-run docker build:

$ touch railsapp/Gemfile
$ time docker build .
Uploading context 337.9 kB
Uploading context
Step 1 : FROM ubuntu:12.10
 ---> b750fe79269d
Step 2 : MAINTAINER brian@morearty.org
 ---> Using cache

etc. etc. ...

Step 10 : ADD railsapp/Gemfile.lock Gemfile.lock
 ---> f5a40ceac4ce
Step 11 : RUN bundle install
 ---> Running in 3095386f3f46
Fetching gem metadata from https://rubygems.org/.........
Fetching additional metadata from https://rubygems.org/..
Installing rake (10.1.1)
Installing i18n (0.6.9)
Installing minitest (4.7.5)

etc. etc. ...

real    1m5.819s

It worked. Because I touched the Gemfile, Docker did what we want: it re-ran bundle install. The total time is 1 minute 5 seconds, which I guess is unavoidable since Bundler takes a while.

Conclusion

This is really good news for Rails developers who use Docker. It greatly reduces the frustration and removes a barrier. I definitely recommend you use this technique to speed up installing Rails apps into Docker images.

Want to Learn More Docker and do Hands-On Exercises?

Alvin Lai and have created a four-hour, introductory Docker training video. The video is self-paced and includes hands-on exercises.

It will be time well-spent. You will learn as much from this video as you would in several weeks of learning and using Docker on your own and asking questions on the IRC channel.

Go to Hands on with Docker to learn more and to buy the video.