How to Skip Bundle Install When Deploying a Rails App to Docker if the Gemfile Hasn’t Changed

With Docker, you can deploy a Rails app to a container that has all of the app’s dependencies (the right version of Ruby, your gems, etc.) embedded in it. You can fully test the app in the container, then ship the container to your production host(s) when you are ready. It’s like a VM only much lighter weight because it doesn’t have to reserve memory in advance.

Squirrel

I won’t go into the details of how to create a container in this post. But the short version is: you create a Dockerfile, which is a script that sets up the container, and then you run docker build to run that script.

Docker has an automatic caching mechanism to greatly speed things up after the first build of a Dockerfile. Each step (each line of the file) is cached separately. If you change line 6 of a 10-line Dockerfile and build it again, lines 1-5 will be skipped. Docker will just pull the results out of the cache. Nice. You can skip really slow steps like compiling Ruby.

But if you want to start using Docker with a Rails app, you will quickly notice a problem: you can’t cache the bundle install step. Any time you rebuild your image—even when the gems haven’t changed—you will have to sit and wait for Bundler to finish.

It’s annoying because once you have become accustomed to the huge speed boost you get from the Docker’s cache on other steps, you get pretty antsy waiting around for Bundler when you know perfectly well that you didn’t change the Gemfile.

If you have used Heroku, you know what I’m talking about. Every time you git push to Heroku it re-runs Bundler even when your Gemfile didn’t change. Other than asset compilation, it’s the slowest part of deploying to Heroku. (They don’t use Docker, but they do use the same underlying technology—Linux Containers—and when I use Docker I notice a lot of similar behavior to Heroku and it makes it more clear why Heroku made the architectural choices they did.)

So: why does Docker cache the other steps but not bundle install? Because before version 0.7.3, Docker doesn’t cache an ADD instruction or any instruction after it. (ADD copies a file or directory into the image from the build machine at build time.) And the usual way to add a Rails app to an image is to git pull the latest code and then copy it in with ADD.

It makes sense that Docker doesn’t cache ADDs. It’s pretty likely that you want the latest version of the thing you’re copying into the container. But it also introduces this problem.

Bundler depends on the Gemfile. The Gemfile is part of an ADDed directory (the Rails app), and the directory tree contains other frequently-modified files (e.g., source files). So Bundler has to run after you ADD the app, which means the bundle install step can’t be cached.

Well, There’s Good News

I mean, of course there is. Why else would I write this post in a blog about stuff I like?

I gave you a hint when I said “before version 0.7.3” above. Docker 0.7.3 was released a few days ago and it has a killer feature for Rails developers (the same feature should benefit developers of Python apps with requirements.txt, and might be a good alternative to this approach, which Nick Stinemates of Docker proposed).

The ADD command can now be cached.

If you ADD a directory tree, Docker (remarkably quickly, using a tar algorithm) generates a hash from the contents of all the files in it. If no file has changed, it will use the cached version of the same ADD instruction from a previous run of docker build.

This a a big deal. Now you can take advantage of the Docker cache to cache your bundle installs.

“But Brian,” you say, “that won’t help. It can’t use the cache if I deploy my app after changing source code in the directory tree.” Well…it can, with this one weird trick.

An Example

Let’s get started. First we’ll look at what a Dockerfile might look like for a Rails app. (Actually, you wouldn’t usually use SQLite in production and you wouldn’t usually put the database in the same container as the app. But that’s not important here.)

FROM ubuntu:12.10
MAINTAINER brian@morearty.org

# Install dependencies.
RUN apt-get update
RUN apt-get install -y curl git build-essential ruby1.9.3 libsqlite3-dev
RUN gem install rubygems-update --no-ri --no-rdoc
RUN update_rubygems
RUN gem install bundler sinatra --no-ri --no-rdoc

# Copy the app into the image.
ADD railsapp /opt/railsapp

# Now that the app is here, we can bundle.
WORKDIR /opt/railsapp
RUN bundle install

# Set up a default runtime command
CMD rails server thin

Let’s run docker build for the first time. (The Ubuntu image is already on my machine, so there’s no wait to pull it.) For my timing, I used a plain-vanilla Rails 4 app with the default gems.

$ time docker build .
Step 1 : FROM ubuntu:12.10
 ---> b750fe79269d
Step 2 : MAINTAINER brian@morearty.org
 ---> Running in 3479b6010856
 ---> 838c7b6022ab
Step 3 : RUN apt-get update
 ---> Running in b60b17f4385c
Ign http://archive.ubuntu.com quantal InRelease
Hit http://archive.ubuntu.com quantal Release.gpg

... etc., etc. ...

Step 10 : RUN bundle install
 ---> Running in 7a57242449d7
Fetching gem metadata from https://rubygems.org/.........
Fetching additional metadata from https://rubygems.org/..
Installing rake (10.1.1)
Installing i18n (0.6.9)
Installing minitest (4.7.5)
...
real    2m18.260s

Okay, two minutes 18 seconds for the initial build. Now we modify a source file (but not the Gemfile), then docker build again. I’m using Docker 0.7.3—the version that supports cached ADDs. But because one source file was changed, the entire app directory is considered to have been changed. So Docker will not use the cached version of the app. Since bundle install comes after the ADD and every step after an uncached step is also uncached, Docker will run it.

$ time docker build .
Uploading context 337.9 kB
Uploading context
Step 1 : FROM ubuntu:12.10
 ---> b750fe79269d
Step 2 : MAINTAINER brian@morearty.org
 ---> Using cache
 ---> 5895ed9e78a4

etc., etc. ...

Step 10 : RUN bundle install
 ---> Running in 3f0ddbeea83e
Fetching gem metadata from https://rubygems.org/.........
Fetching additional metadata from https://rubygems.org/..
Installing rake (10.1.1)
Installing i18n (0.6.9)
Installing minitest (4.7.5)
...
real    0m55.596s

55 seconds. Most of that time was spent in bundle install. That sucks. I didn’t change the Gemfile at all.

I Like Stuff that’s Cached

If I were using an older version of Docker, I would just have to put up with it. But watch me as I cleverly add a few lines to my Dockerfile to make it cache the bundle install. Pay special attention to lines 13-16:

FROM ubuntu:12.10
MAINTAINER brian@morearty.org

# Install dependencies.
RUN apt-get update
RUN apt-get install -y curl git build-essential ruby1.9.3 libsqlite3-dev
RUN gem install rubygems-update --no-ri --no-rdoc
RUN update_rubygems
RUN gem install bundler sinatra --no-ri --no-rdoc

# Copy the Gemfile and Gemfile.lock into the image. 
# Temporarily set the working directory to where they are. 
WORKDIR /tmp 
ADD railsapp/Gemfile Gemfile
ADD railsapp/Gemfile.lock Gemfile.lock
RUN bundle install 

# Everything up to here was cached. This includes
# the bundle install, unless the Gemfiles changed.
# Now copy the app into the image.
ADD railsapp /opt/railsapp

# Set the final working dir to the Rails app's location.
WORKDIR /opt/railsapp

# Set up a default runtime command
CMD rails server thin

owl

See what I did there? Before copying the whole app, I copied just the Gemfile and Gemfile.lock into the tmp directory and ran bundle install from there. If neither file changed, both ADD instructions are cached. Because they are cached, subsequent commands—like the bundle install one—remain eligible for using the cache.
Only after bundling do I copy the rest of the app into the image. You want to do this as late as possible since no later step can be cached. (I could have moved the CMD step farther up, too, but it’s so fast it didn’t matter.)

Let’s build it and see the resulting time saved. This time I’m pasting the entire output of docker build, so you can see that everything is cached. Observe line 33, which says that the bundle install command was cached:

Uploading context 337.9 kB
Uploading context
Step 1 : FROM ubuntu:12.10
 ---> b750fe79269d
Step 2 : MAINTAINER brian@morearty.org
 ---> Using cache
 ---> 5895ed9e78a4
Step 3 : RUN apt-get update
 ---> Using cache
 ---> d2898351463e
Step 4 : RUN apt-get install -y curl git build-essential ruby1.9.3 libsqlite3-dev
 ---> Using cache
 ---> aa1dbf3e6452
Step 5 : RUN gem install rubygems-update --no-ri --no-rdoc
 ---> Using cache
 ---> 8f4ef4bcfd32
Step 6 : RUN update_rubygems
 ---> Using cache
 ---> 358ef92178c7
Step 7 : RUN gem install bundler sinatra --no-ri --no-rdoc
 ---> Using cache
 ---> 9e7d9c0fd7de
Step 8 : WORKDIR /tmp
 ---> Using cache
 ---> b10a5c9f12c0
Step 9 : ADD railsapp/Gemfile Gemfile
 ---> Using cache
 ---> 79deb268175e
Step 10 : ADD railsapp/Gemfile.lock Gemfile.lock
 ---> Using cache
 ---> 1315e65cb616
Step 11 : RUN bundle install
 ---> Using cache
 ---> 6f067cbf6c2f
Step 12 : ADD railsapp /opt/railsapp
 ---> 655d668c338d
Step 13 : WORKDIR /opt/railsapp
 ---> Running in 0272330053b5
 ---> 94dda8e65416
Step 14 : CMD rails server thin
 ---> Running in 9afb1cee2bcf
 ---> 1429538cbdfb
Successfully built 1429538cbdfb

real    0m17.974s
user    0m0.000s
sys     0m0.020s

18 seconds. Not bad, compared to 55.

As one last test, let’s make sure docker will not use the cache if I change the Gemfile. I’m just going to touch it and then re-run docker build:

$ touch railsapp/Gemfile
$ time docker build .
Uploading context 337.9 kB
Uploading context
Step 1 : FROM ubuntu:12.10
 ---> b750fe79269d
Step 2 : MAINTAINER brian@morearty.org
 ---> Using cache

etc. etc. ...

Step 10 : ADD railsapp/Gemfile.lock Gemfile.lock
 ---> f5a40ceac4ce
Step 11 : RUN bundle install
 ---> Running in 3095386f3f46
Fetching gem metadata from https://rubygems.org/.........
Fetching additional metadata from https://rubygems.org/..
Installing rake (10.1.1)
Installing i18n (0.6.9)
Installing minitest (4.7.5)

etc. etc. ...

real    1m5.819s

It worked. Because I touched the Gemfile, Docker did what we want: it re-ran bundle install. The total time is 1 minute 5 seconds, which I guess is unavoidable since Bundler takes a while.

Conclusion

This is really good news for Rails developers who use Docker. It greatly reduces the frustration and removes a barrier. I definitely recommend you use this technique to speed up installing Rails apps into Docker images.

Want to Learn More Docker and do Hands-On Exercises?

Alvin Lai and have created a four-hour, introductory Docker training video. The video is self-paced and includes hands-on exercises.

It will be time well-spent. You will learn as much from this video as you would in several weeks of learning and using Docker on your own and asking questions on the IRC channel.

Go to Hands on with Docker to learn more and to buy the video.

13 thoughts on “How to Skip Bundle Install When Deploying a Rails App to Docker if the Gemfile Hasn’t Changed”

  1. This implies that to deploy, you push your local copy of Rails into the container, push the container to a repo or directly to a server, then what? I’m curious about peoples full deployment cycle and why it might be preferred over a git-based deployment.

  2. This doesn’t work for me. I can’t see how you got this working. I guess I’ll dig though the issues in github and see what I’m doing wrong. Seems like a bug though.

    I get

    ...
    remote: Step 5 : WORKDIR /srv/www
    remote:  ---> Using cache
    remote:  ---> 93137f4c4d57
    remote: Step 6 : ADD Gemfile /srv/www/Gemfile
    remote:  ---> c34937950e3f
    remote: Removing intermediate container 82313061c1a1
    remote: Step 7 : RUN bundle install --without test development
    remote:  ---> Running in 59026f89cf40
    remote: Don't run Bundler as root. Bundler can ask for sudo if it is needed, and
    remote: installing your bundle as root will break this application for all non-root
    remote: users on this machine.
    remote: Fetching git://github.com/NigelThorne/eventmachine.git
    remote: Fetching gem metadata from https://rubygems.org/........
    remote: Fetching additional metadata from https://rubygems.org/..
    remote: Resolving dependencies...
    remote: Installing thread_safe 0.3.4
    remote: Installing descendants_tracker 0.0.4
    

    I’m using Docker version 1.3.1, build 4e9bbfa
    🙁

  3. Thank for the article. I noticed that Docker 1.3.2 caches the ‘bundle install’ step without the ‘WORKDIR /tmp’ trick.

  4. There’s also this idiom:

    bundle check || bundle install
    

    Which prevents running the bundle install entirely if the Gemfile.lock’s dependencies are already satisfied.

    Not sure if that works for your situation but I’ve found it useful in vendored bundle environments.

  5. if you run

    bundle install --without development test
    

    inside the /tmp directory, then you need to do something like this:

    RUN echo 'BUNDLE_WITHOUT: development:test' > .bundle/config
    

    inside your application root, otherwise the webserver (at list passenger over nginx) will try to load all gems, including test and dev gems)

  6. Thanks for this article, I definitely found this pattern to save a lot of time when building images over and over. But I encountered a small discrepancy that could potentially cause some unexpected behavior (it did for me). If I put this in the Dockerfile:

    WORKDIR /tmp
    ADD Gemfile Gemfile
    ADD Gemfile.lock Gemfile.lock

    I would have expected the Gemfiles to be placed in the /tmp directory. But I found that it didn’t, instead they were placed in the root / directory. After looking at the docs, I found that the WORKDIR command is not used as the context for ADD commands; instead ADD commands assume an absolute path as the destination.

    The build still technically worked, because even though RUN bundle install was run from the /tmp directory, bundler saw the Gemfile in root and used it. But this caused a problem on a subsequent build when an older Gemfile was still in root and was used for a bundle install step. To ensure that the files were added reliably a, I just did this:

    WORKDIR /tmp
    ADD Gemfile /tmp/Gemfile
    ADD Gemfile.lock /tmp/Gemfile.lock

    And that way there are no stray Gemfiles lying around.

    1. Thanks for the tip, Taylor. I’m not sure why I didn’t catch that in the first place. I can only hope it’s because maybe the behavior of ADD changed between last year and now. 🙂

  7. On Shelly Cloud (https://shellycloud.com) we are using a local cache of the bundle (gems) so `bundle install` command is invoked only on the first virtual server. That package is shared between other vservers. `bundle install` will be skipped if there was no change in gems.

  8. Thanks for this — I’m just getting on the Docker bandwagon, and this is super helpful.

    Question: on the docker-compose Rails quickstart page, (https://docs.docker.com/compose/rails) , I think they get the same effect by adding the Gemfile to an empty /myapp directory, running bundle install, and then adding the rest of the app into the same /myapp directory. Does this deliver the same caching benefit as your /tmp folder strategy, above? Any thoughts on which you prefer? (Or maybe the difference is negligible…)

Leave a Reply

Your email address will not be published. Required fields are marked *

Feel free to use <a>, <b>, <i>, <strong>, <em>, <strike>, <code>.

Code blocks:
[code language="ruby/javascript/html/css/sass/bash"]
[/code]