How to Eliminate Unnecessary Database Queries

Bullet

Argh! Your Rails app is taking far too long to process each request. What’s going on!?

On the server side, one of the things that can slow down a Rails app the most is the number of round-trips to the database. And one of the most common causes of too many round-trips: the infamous N+1 query.

In the following video snippet from my Rails 4.1 Performance Fundamentals video course, you will learn how to install and use the Bullet gem. This gem notifies you, the developer, whenever any page has an N+1 query. And it tells you exactly how to fix it.

I’ve got a bunch of Pluralsight unlimited access cards to give away

Pluralsight is a subscription-based tech training site for developers. After I finished making this course, they gave me hundreds of Unlimited Access Passes to give away. The cards give you unlimited viewing hours for one month. Without a card, their normal free trial is only good for 200 minutes over 10 days.

If you’d like one, just add a comment and I’ll email you a private message with a code. (While supplies last.)

In addition to being able to view all the old PeepCode content, these cards give you access to watch a bunch of newer tech training videos. Many of them are really good. They’ve got some super-popular, in-depth courses on topics like Angular and jQuery. And since they also bought Digital Tutors, they’ve got a bunch of content on digital creation. (Photoshop, Illustrator, game design, animation, and so on.)

 

Announcing a New Video Course: Rails 4.1 Performance Fundamentals

why-performance

You’re a developer on a Rails team. Your boss comes to the team and says, “The app is too slow. Users are unhappy and they’re dropping off.” What are you going to do?

Do you:

  1. Blame the developers who architected the app, but no longer work at the company.
  2. Tell your boss that you’re late for a “very important Ping Pong game” and you’ll talk later.
  3. Tell your boss that you’ll have to rewrite the whole thing in Java.

On Friday, July 4th, Pluralsight published my 4-hour video course: Rails 4.1 Performance Fundamentals. (A US holiday is a bad time to release a new training video, which is why I didn’t blog about it until now. :-) )

Watch this course to learn how to profile a Rails app, identify performance problems, and make your app faster. In it I cover a huge variety of profiling tools, some general-purpose and some designed to look for specific problems such as N+1 queries or missing indexes. And I cover both server-side and client-side (but all Rails-specific) optimization techniques, all the way up to the latest improvements that were added in Rails 4.0: Turbolinks and Russian Doll Caching. You can see the full table of contents without getting a Pluralsight subscription.

Want to see a free sample? Your wish is my command. Here’s a snippet from the module on Turbolinks and pjax. That whole module is about 35 minutes long, and goes into some of the gotchas and how to work around them, how to use Turbolinks effectively with jQuery, and a bit about pjax—an alternative to Turbolinks that GitHub created. But this snippet gives you a taste of it. :-)

If you like what you see, sign up for a free trial on Pluralsight. You get 10 days to watch up to 200 minutes’ worth of technical training videos. You’ll be able to get through about 3/4 of my Rails Performance course with your 200 minutes. And for only $29/month you can watch unlimited videos, and can cancel at any time.

Introducing Whiny Validation. So You Can Figure Out Why Your Rails Specs Failed.

whining_boySometimes when I write a spec in a Rails app, it fails silently but it isn’t clear why. For example, here’s a pretty standard (if minimal) controller spec:

  describe "POST 'create'" do
    it "makes a new customer" do
      post :create
      expect(response).to redirect_to Customer.last
    end
  end

Or perhaps you would use expect { post :create } to change { Customer.count }.by(1). Either way, I run this spec, and the error message doesn’t give me enough information about why it failed:

$ rspec spec/controllers/customers_controller_spec.rb
F

Failures:

  1) CustomersController POST 'create' makes a new customer
     Failure/Error: expect(response).to redirect_to Customer.last
       Expected response to be a <:redirect>, but was <200>
     # /Users/brian/.rvm/gems/ruby-1.9.3-p385/gems/rspec-expectations-2.13.0/lib/rspec/expectations/fail_with.rb:32:in `fail_with'
     # /Users/brian/.rvm/gems/ruby-1.9.3-p385/gems/rspec-expectations-2.13.0/lib/rspec/expectations/handler.rb:33:in `handle_matcher'
     # /Users/brian/.rvm/gems/ruby-1.9.3-p385/gems/rspec-expectations-2.13.0/lib/rspec/expectations/expectation_target.rb:34:in `to'
     # ./spec/controllers/customers_controller_spec.rb:8:in `block (3 levels) in <top (required)>'
...

How would you debug this to figure out what you did wrong? One thing you could do: look at the log. But it doesn’t help much:

Processing by CustomersController#create as HTML
   (0.1ms)  SAVEPOINT active_record_1
   (0.1ms)  ROLLBACK TO SAVEPOINT active_record_1
  Rendered customers/new.html.haml within layouts/application (0.2ms)
Completed 200 OK in 55ms (Views: 41.0ms | ActiveRecord: 1.7ms)
  Customer Load (0.6ms)  SELECT "customers".* FROM "customers" ORDER BY "customers"."id" DESC LIMIT 1
   (0.1ms)  ROLLBACK

Hmm, no clues there. At this point, if a code inspection doesn’t help, I would typically resort to inserting a debugger statement or various puts statements. It’s not a very efficient way to work.

But I’ve noticed that frequently, this type of spec failure is due to an input validation error. That’s why I created the Whiny Validation gem.

Whiny Validation watches for ActiveRecord validation errors on all models. Whenever one occurs, it logs the validation message and dumps the ActiveRecord object (with inspect) to the log. All you have to do is search the log for the word “validation,” or browse through the log and the yellow text should jump out at you.

To make it work, simply add this to your Gemfile:

gem 'whiny_validation'

Now rerun the spec and look at the log:

Processing by CustomersController#create as HTML
   (0.1ms)  SAVEPOINT active_record_1
  Validation failed  #<Customer id: nil, email: nil, created_at: nil, updated_at: nil>
    => Email can't be blank
   (0.1ms)  ROLLBACK TO SAVEPOINT active_record_1
  Rendered customers/new.html.haml within layouts/application (0.2ms)
Completed 200 OK in 28ms (Views: 11.7ms | ActiveRecord: 2.3ms)
  Customer Load (0.5ms)  SELECT "customers".* FROM "customers" ORDER BY "customers"."id" DESC LIMIT 1
   (0.1ms)  ROLLBACK

Let’s see that in full color:

Whiny Validation

That’s much better. It tells me that a validation failed, it shows the object that had the failure, and it shows the validation error message. Now it’s easy to see my mistake: I forgot to pass in an email address to the “create” action in my spec. When I change the spec to this, it passes:

post :create, customer: { email: "test@example.com" }

How Does it Work?

Here is the entire implementation, with the exception of the configuration code:

module WhinyValidation
  extend ActiveSupport::Concern

  included do
    after_validation :whiny_validation,
      :if => proc { |model| model.errors.present? }
  end

  def whiny_validation
    ActiveSupport::Notifications.instrument(
      "validation_failed.whiny_validation",
      :object => self,
      :error_messages => errors.full_messages)
  end

  class LogSubscriber < ActiveSupport::LogSubscriber
    def validation_failed(event)
      debug do
        name = color("Validation failed", YELLOW, true)
        object = event.payload[:object]
        error_messages = color(
          event.payload[:error_messages].map {|message|
            "    => #{message}"
          }.join("\n"), YELLOW
        )

        "  #{name}  #{object.inspect}\n#{error_messages}"
      end
    end
  end

  WhinyValidation::LogSubscriber.attach_to :whiny_validation
end

module ActiveRecord
  class Base
    include WhinyValidation
  end
end

Let’s break this down. First, let’s look at the beginning and the end of the file:

module WhinyValidation
  extend ActiveSupport::Concern

  included do
    after_validation :whiny_validation,
      :if => proc { |model| model.errors.present? }
  end
end

module ActiveRecord
  class Base 
    include WhinyValidation
  end
end

ActiveSupport::Concern lets me be notified when this module is included elsewhere, using the included statement. At the end of the file, I include WhinyValidation in ActiveRecord::Base. The included statement gets run, so it adds an after_validation callback to every subclass of ActiveRecord::Base. The callback calls my whiny_validation method if any errors were detected during validation.

Next, let’s take a look at the whiny_validation method itself:

  def whiny_validation
    ActiveSupport::Notifications.instrument(
      "validation_failed.whiny_validation",
      :object => self,
      :error_messages => errors.full_messages)
  end

What I want to do is write to the logfile. I could just have written logger.debug, but I want to use color in my message and ActiveSupport::LogSubscriber has helper methods for writing to the log with color. So I call ActiveSupport::Notifications.instrument (which means “broadcast a notification”) and I pass two things:

  1. A string with two parts, separated by a dot: first the name of an event, followed by a namespace. (I’m not sure why the namespace comes last. It feels backward to me.)
  2. The payload, which is a hash. What belongs in the payload? It’s up to you when you define a notification. I decided I include the ActiveRecord object whose validation failed, and the full list of error messages.

Next, someone has to listen to the notification and do the logging. I created a class named LogSubscriber in the WhinyValidation namespace, and made it a subclass of ActiveSupport::LogSubscriber so I can use that color method:

  class LogSubscriber < ActiveSupport::LogSubscriber
    def validation_failed(event)
      debug do
        name = color("Validation failed", YELLOW, true)
        object = event.payload[:object]
        error_messages = color(
          event.payload[:error_messages].map {|message|
            "    => #{message}"
          }.join("\n"), YELLOW
        )

        "  #{name}  #{object.inspect}\n#{error_messages}"
      end
    end
  end

The first line of validation_failed is debug do. Since we’re in a subclass of ActiveSupport::LogSubscriber, methods are defined for all the logger levels. You don’t have to call Rails.logger.debug, just call debug. (Actually, the real implementation calls send(WhinyValidation.configuration.log_level), because the log level is configurable. I simplified it for this blog post.)

By the way, most people don’t realize that when you call the logger, you are allowed to pass in a block. Whatever the block returns gets converted to a string with .to_s and then is output to the log. I used that trick here so that in any environment that doesn’t have the log_level set to debug, the code doesn’t have to waste time constructing a message.

Next, I call ActiveSupport::LogSubscriber‘s helper method color and pass in the YELLOW constant (also defined in ActiveSupport::LogSubscriber), to colorize the “Validation failed” string. The true means “make it bold.”

Then I do the same thing with the error messages, making them yellow and concatenating them.

The last thing I do is return a string which includes the “Validation failed” message, an inspection of the object that was invalid, and the list of error messages.

Finally, we have to hook it up:

  WhinyValidation::LogSubscriber.attach_to :whiny_validation

The last line is calling a method whose implementation is in the base class, ActiveSupport::LogSubscriber. It tells my LogSubscriber class to watch for ActiveSupport notifications that use the “whiny_validation” namespace. Whenever one occurs, it extracts the event name (in this case, “validation_failed”) and looks for a method by that name. If it exists, it calls it and passes in an event object.

One of the methods on the event object is payload, which returns the hash that I passed in.

Conclusion

This turned out to be pretty easy because:

  1. Ruby lets you extend existing classes, so I was able to inject my own after_validation hook into ActiveRecord::Base.
  2. ActiveSupport::Notifications and ActiveSupport::LogSubscriber work together to make it really easy to log anything that broadcasts a notification, and to log with color.

Go ahead and use this in your projects. Please let me know if you liked it.

One last note: if you want to configure it to log at the :info level or any other level, take a look at the README. It tells you how.

How to Skip Bundle Install When Deploying a Rails App to Docker if the Gemfile Hasn’t Changed

With Docker, you can deploy a Rails app to a container that has all of the app’s dependencies (the right version of Ruby, your gems, etc.) embedded in it. You can fully test the app in the container, then ship the container to your production host(s) when you are ready. It’s like a VM only much lighter weight because it doesn’t have to reserve memory in advance.

Squirrel

I won’t go into the details of how to create a container in this post. But the short version is: you create a Dockerfile, which is a script that sets up the container, and then you run docker build to run that script.

Docker has an automatic caching mechanism to greatly speed things up after the first build of a Dockerfile. Each step (each line of the file) is cached separately. If you change line 6 of a 10-line Dockerfile and build it again, lines 1-5 will be skipped. Docker will just pull the results out of the cache. Nice. You can skip really slow steps like compiling Ruby.

But if you want to start using Docker with a Rails app, you will quickly notice a problem: you can’t cache the bundle install step. Any time you rebuild your image—even when the gems haven’t changed—you will have to sit and wait for Bundler to finish.

It’s annoying because once you have become accustomed to the huge speed boost you get from the Docker’s cache on other steps, you get pretty antsy waiting around for Bundler when you know perfectly well that you didn’t change the Gemfile.

If you have used Heroku, you know what I’m talking about. Every time you git push to Heroku it re-runs Bundler even when your Gemfile didn’t change. Other than asset compilation, it’s the slowest part of deploying to Heroku. (They don’t use Docker, but they do use the same underlying technology—Linux Containers—and when I use Docker I notice a lot of similar behavior to Heroku and it makes it more clear why Heroku made the architectural choices they did.)

So: why does Docker cache the other steps but not bundle install? Because before version 0.7.3, Docker doesn’t cache an ADD instruction or any instruction after it. (ADD copies a file or directory into the image from the build machine at build time.) And the usual way to add a Rails app to an image is to git pull the latest code and then copy it in with ADD.

It makes sense that Docker doesn’t cache ADDs. It’s pretty likely that you want the latest version of the thing you’re copying into the container. But it also introduces this problem.

Bundler depends on the Gemfile. The Gemfile is part of an ADDed directory (the Rails app), and the directory tree contains other frequently-modified files (e.g., source files). So Bundler has to run after you ADD the app, which means the bundle install step can’t be cached.

Well, There’s Good News

I mean, of course there is. Why else would I write this post in a blog about stuff I like?

I gave you a hint when I said “before version 0.7.3” above. Docker 0.7.3 was released a few days ago and it has a killer feature for Rails developers (the same feature should benefit developers of Python apps with requirements.txt, and might be a good alternative to this approach, which Nick Stinemates of Docker proposed).

The ADD command can now be cached.

If you ADD a directory tree, Docker (remarkably quickly, using a tar algorithm) generates a hash from the contents of all the files in it. If no file has changed, it will use the cached version of the same ADD instruction from a previous run of docker build.

This a a big deal. Now you can take advantage of the Docker cache to cache your bundle installs.

“But Brian,” you say, “that won’t help. It can’t use the cache if I deploy my app after changing source code in the directory tree.” Well…it can, with this one weird trick.

An Example

Let’s get started. First we’ll look at what a Dockerfile might look like for a Rails app. (Actually, you wouldn’t usually use SQLite in production and you wouldn’t usually put the database in the same container as the app. But that’s not important here.)

FROM ubuntu:12.10
MAINTAINER brian@morearty.org

# Install dependencies.
RUN apt-get update
RUN apt-get install -y curl git build-essential ruby1.9.3 libsqlite3-dev
RUN gem install rubygems-update --no-ri --no-rdoc
RUN update_rubygems
RUN gem install bundler sinatra --no-ri --no-rdoc

# Copy the app into the image.
ADD railsapp /opt/railsapp

# Now that the app is here, we can bundle.
WORKDIR /opt/railsapp
RUN bundle install

# Set up a default runtime command
CMD rails server thin

Let’s run docker build for the first time. (The Ubuntu image is already on my machine, so there’s no wait to pull it.) For my timing, I used a plain-vanilla Rails 4 app with the default gems.

$ time docker build .
Step 1 : FROM ubuntu:12.10
 ---> b750fe79269d
Step 2 : MAINTAINER brian@morearty.org
 ---> Running in 3479b6010856
 ---> 838c7b6022ab
Step 3 : RUN apt-get update
 ---> Running in b60b17f4385c
Ign http://archive.ubuntu.com quantal InRelease
Hit http://archive.ubuntu.com quantal Release.gpg

... etc., etc. ...

Step 10 : RUN bundle install
 ---> Running in 7a57242449d7
Fetching gem metadata from https://rubygems.org/.........
Fetching additional metadata from https://rubygems.org/..
Installing rake (10.1.1)
Installing i18n (0.6.9)
Installing minitest (4.7.5)
...
real    2m18.260s

Okay, two minutes 18 seconds for the initial build. Now we modify a source file (but not the Gemfile), then docker build again. I’m using Docker 0.7.3—the version that supports cached ADDs. But because one source file was changed, the entire app directory is considered to have been changed. So Docker will not use the cached version of the app. Since bundle install comes after the ADD and every step after an uncached step is also uncached, Docker will run it.

$ time docker build .
Uploading context 337.9 kB
Uploading context
Step 1 : FROM ubuntu:12.10
 ---> b750fe79269d
Step 2 : MAINTAINER brian@morearty.org
 ---> Using cache
 ---> 5895ed9e78a4

etc., etc. ...

Step 10 : RUN bundle install
 ---> Running in 3f0ddbeea83e
Fetching gem metadata from https://rubygems.org/.........
Fetching additional metadata from https://rubygems.org/..
Installing rake (10.1.1)
Installing i18n (0.6.9)
Installing minitest (4.7.5)
...
real    0m55.596s

55 seconds. Most of that time was spent in bundle install. That sucks. I didn’t change the Gemfile at all.

I Like Stuff that’s Cached

If I were using an older version of Docker, I would just have to put up with it. But watch me as I cleverly add a few lines to my Dockerfile to make it cache the bundle install. Pay special attention to lines 13-16:

FROM ubuntu:12.10
MAINTAINER brian@morearty.org

# Install dependencies.
RUN apt-get update
RUN apt-get install -y curl git build-essential ruby1.9.3 libsqlite3-dev
RUN gem install rubygems-update --no-ri --no-rdoc
RUN update_rubygems
RUN gem install bundler sinatra --no-ri --no-rdoc

# Copy the Gemfile and Gemfile.lock into the image. 
# Temporarily set the working directory to where they are. 
WORKDIR /tmp 
ADD railsapp/Gemfile Gemfile
ADD railsapp/Gemfile.lock Gemfile.lock
RUN bundle install 

# Everything up to here was cached. This includes
# the bundle install, unless the Gemfiles changed.
# Now copy the app into the image.
ADD railsapp /opt/railsapp

# Set the final working dir to the Rails app's location.
WORKDIR /opt/railsapp

# Set up a default runtime command
CMD rails server thin

owl

See what I did there? Before copying the whole app, I copied just the Gemfile and Gemfile.lock into the tmp directory and ran bundle install from there. If neither file changed, both ADD instructions are cached. Because they are cached, subsequent commands—like the bundle install one—remain eligible for using the cache.
Only after bundling do I copy the rest of the app into the image. You want to do this as late as possible since no later step can be cached. (I could have moved the CMD step farther up, too, but it’s so fast it didn’t matter.)

Let’s build it and see the resulting time saved. This time I’m pasting the entire output of docker build, so you can see that everything is cached. Observe line 33, which says that the bundle install command was cached:

Uploading context 337.9 kB
Uploading context
Step 1 : FROM ubuntu:12.10
 ---> b750fe79269d
Step 2 : MAINTAINER brian@morearty.org
 ---> Using cache
 ---> 5895ed9e78a4
Step 3 : RUN apt-get update
 ---> Using cache
 ---> d2898351463e
Step 4 : RUN apt-get install -y curl git build-essential ruby1.9.3 libsqlite3-dev
 ---> Using cache
 ---> aa1dbf3e6452
Step 5 : RUN gem install rubygems-update --no-ri --no-rdoc
 ---> Using cache
 ---> 8f4ef4bcfd32
Step 6 : RUN update_rubygems
 ---> Using cache
 ---> 358ef92178c7
Step 7 : RUN gem install bundler sinatra --no-ri --no-rdoc
 ---> Using cache
 ---> 9e7d9c0fd7de
Step 8 : WORKDIR /tmp
 ---> Using cache
 ---> b10a5c9f12c0
Step 9 : ADD railsapp/Gemfile Gemfile
 ---> Using cache
 ---> 79deb268175e
Step 10 : ADD railsapp/Gemfile.lock Gemfile.lock
 ---> Using cache
 ---> 1315e65cb616
Step 11 : RUN bundle install
 ---> Using cache
 ---> 6f067cbf6c2f
Step 12 : ADD railsapp /opt/railsapp
 ---> 655d668c338d
Step 13 : WORKDIR /opt/railsapp
 ---> Running in 0272330053b5
 ---> 94dda8e65416
Step 14 : CMD rails server thin
 ---> Running in 9afb1cee2bcf
 ---> 1429538cbdfb
Successfully built 1429538cbdfb

real    0m17.974s
user    0m0.000s
sys     0m0.020s

18 seconds. Not bad, compared to 55.

As one last test, let’s make sure docker will not use the cache if I change the Gemfile. I’m just going to touch it and then re-run docker build:

$ touch railsapp/Gemfile
$ time docker build .
Uploading context 337.9 kB
Uploading context
Step 1 : FROM ubuntu:12.10
 ---> b750fe79269d
Step 2 : MAINTAINER brian@morearty.org
 ---> Using cache

etc. etc. ...

Step 10 : ADD railsapp/Gemfile.lock Gemfile.lock
 ---> f5a40ceac4ce
Step 11 : RUN bundle install
 ---> Running in 3095386f3f46
Fetching gem metadata from https://rubygems.org/.........
Fetching additional metadata from https://rubygems.org/..
Installing rake (10.1.1)
Installing i18n (0.6.9)
Installing minitest (4.7.5)

etc. etc. ...

real    1m5.819s

It worked. Because I touched the Gemfile, Docker did what we want: it re-ran bundle install. The total time is 1 minute 5 seconds, which I guess is unavoidable since Bundler takes a while.

Conclusion

This is really good news for Rails developers who use Docker. It greatly reduces the frustration and removes a barrier. I definitely recommend you use this technique to speed up installing Rails apps into Docker images.

Want to Learn More Docker and do Hands-On Exercises?

Alvin Lai and have created a four-hour, introductory Docker training video. The video is self-paced and includes hands-on exercises.

It will be time well-spent. You will learn as much from this video as you would in several weeks of learning and using Docker on your own and asking questions on the IRC channel.

Go to Hands on with Docker to learn more and to buy the video.

Announcing: the first professional Docker training by a third party

You might have heard about Docker, the new open source project that lets you pack, ship and run any Linux application as a lightweight container. It has been getting a lot of attention since dotCloud (now renamed Docker, Inc.) open-sourced it in March.

If you’ve been curious about it and you learn best by doing, check out the new 3-hour Hands on with Docker introductory class that I will be teaching with Alvin Lai on Tuesday, November 12th, 2013 (that’s next week). We are excited to offer the first professional Docker training by any third party.

This is a beta class, so all we are charging is $50 to cover basic expenses (meeting room rental, pizza, etc.). We won’t make a profit on the beta class. When we start teaching the finalized class, it will cost several hundred dollars. So now is your only chance to save money and take the class for only $50.

I could write a lot more, but all the deets are on the Hands on with Docker page.

I hope to see you in class on Nov 12th. Any questions? My email is on that page.

Hey, Google. You Know that New Floating Facility You’re Building in the Bay? Please Name it Floating Point.

About four days ago, news broke that Google is building some large floating structure in the San Francisco Bay. According to CNET, “It is located on a barge just off Treasure Island, between San Francisco and Oakland.”

There has been a lot of speculation about what it is. A floating data center? A mobile retail store for selling Google Glass? A water-borne launch pad for giant Internet wifi blimps?

Whatever you’re building in the Bay, Google, I have only one request:

Please name it Floating Point.

Clearly that is the only name you could give this structure.

Apple has Infinite Loop. You can’t let them be the only Silicon Valley company with a clever name. You need something just as good. Now’s your chance.

Name it Floating Point. Please.

Readers, tweet this if you agree.

Google Structure

Do you Want Someone to Impersonate You to Your LinkedIn Contacts and Leave You Humiliated? Try FounderDating.

[Edit: FounderDating's CEO, Jessica Alter, politely asked me to remove the word "spam" from the title of this post because that word implies I didn't know a message would be sent out. I have changed the post's title at her request.]

A week ago, a friend of mine asked me to write a recommendation for him on FounderDating.com. I happily complied, and when I was done writing the recommendation I was shown pictures of a bunch of my LinkedIn contacts. FounderDating asked me to identify ten that I would vouch for. I didn’t have to do it but I thought oh well, I guess I’ll do it and see what happens. So all I did was click their photos. Nothing more. [Edit: after clicking their photos I clicked a button to submit my selections.]

Bad idea.

As far as I could tell, nothing happened. I figured behind the scenes maybe they would send notes to these people, telling them their contact Brian Morearty had used FounderDating, and suggesting that they try it too. I understand social networking and I understand that when you give an app permission to access a social networking account, your contacts will see that you used it. That would have been ok with me. But what they did was more insidious than that.

FounderDating used the LinkedIn API to send the following note to these ten contacts. I don’t remember being asked to approve the wording–because if I had been asked, I most certainly would not have approved the wording. [Edit: there actually was a link to "see/edit this message" at the bottom left of the form. I apparently didn't see it because it was in the smallest font size on the form and wasn't underlined, the way links often are. If I had clicked it I would have had a chance to change the text of the outgoing message.] I was humiliated this morning when one of my contacts sent me a heartfelt thank-you. I had to write back that while I most definitely do believe in him, I did not write this note.

Here’s what FounderDating wrote to the ten LinkedIn contacts I identified:

Hey [Name],

I was asked to vouch for a few people to join FounderDating – an invite-only network of entrepreneurs (50% engineers) all ready to start their next side-project or company. You’re on my short list. I highly recommend applying.

Apply here > [url]

(Note: copy and paste this link if it’s not clickable).

You can thank me later,
Brian

It seems factually correct, right? Let’s break it down:

  • “Hey [Name]:” personal greeting. Implies that I wrote it myself, not that it was written by someone else to my contacts.
  • “I was asked to vouch for a few people to join FounderDating:” this is accurate. However, again, it implies I wrote this note myself.
  • “You’re on my short list:” well, yes, they did ask me for ten of my contacts. But this gets embarrassingly personal at this point. I highly respect all ten of the people whose photos I clicked. But several of them are people I would not say to their face, “you’re on my short list.”
  • “I highly recommend applying:” EXCUSE ME? When did I recommend applying, much less HIGHLY recommend it?
  • “You can thank me later.” OH MY GOD. What kind of ass says that when he recommends signing up for some online service?

For fuck’s sake, at this point I most certainly DO NOT recommend people use FounderDating. Quite the contrary. I have shut them off from sending more messages on LinkedIn.

Sorry for the rant. I fell for a scam and I’m embarrassed. I hope you will learn from my mistake.

I have written to LinkedIn’s support team notifying them of this abuse of their API Terms of Service. See Section D: “Don’t Harm or Trick Members.”

Your Application must not:

  • Impersonate a LinkedIn user or misrepresent any user or other third party when requesting or publishing information.

Follow-up

On May 8, 2013, this post made the front page of Hacker News and got a lot of attention. FounderDating’s CEO, Jessica Alter, contacted me and we exchanged several emails. She was polite. She requested that I make some corrections to this post, which I have done. Specifically:

  • I added a comment saying that after selecting the pictures of my LinkedIn contacts, I clicked a button to submit the form.
  • I removed the word “spam” from the post’s title because that word “implies we don’t let you know a message goes out.” They did let me know a message would go out. (See screenshot below.) And even if they hadn’t let me know that, I mentioned in the post that I did expect a message to go out.
  • I clarified that there actually was a “see/edit this message” link in the bottom left hand side of the form. I didn’t notice this link because it was in small type and it looked exactly like the text above it, which was not a link.

Here is a screen shot of the form (I blurred the names):

FD form blurred

If you click “See/edit this message” in the bottom left corner, this is what you see:

FD wording

To reiterate, all FounderDating did was send a message I didn’t like to a few people I chose. In addition to requesting that I make clarifications to this post, Ms. Alter asked me what changes I would like to see made to the UI on FounderDating. I requested that they show me the flip-side of the lightbox before the messages get sent, without requiring me to choose to “see/edit” the message. Something like a 2-step wizard would be fine. Step 1: select contacts. Step 2: I am presented with the second screenshot.

I would also like the default message to be less annoying.

Establishing a Connection to a Non-Default Database in Rails 3.2

If you’ve ever built a Rails app in which some models don’t connect to the default database, you know that establish_connection is the method you would call make the connection.

But before I get to that: the space shuttle Endeavor flew directly over my street today, low and slow, with a fighter jet escort. My neighbors and I had an amazing view of it. It was piggybacked on a 747 transport for a publicity tour across California before being retired at a Southern California museum. It was exciting!

Shuttle Flyover

So. Let’s say you needed to talk to an external NASA database. You would create new keys in database.yml called nasa_development, nasa_test, and nasa_production:

# database.yml
development:
  # development configuration goes here

nasa_development:
  # development configuration to external database goes here

# same for test and production...

Then define your models like this:

class Astronaut < ActiveRecord::Base
  establish_connection "nasa_#{Rails.env}"
end

In this example, Rails looks for an ‘astronauts‘ table in the external database. (Automatic table naming still works. You can override that too, by calling self.table_name=).

In the past, when I’ve had multiple tables on a single connection, I always included a module that makes the connection. Let’s define models for Astronaut and Shuttle:

module NasaConnection
  extend ActiveSupport::Concern
  included do
    establish_connection "nasa_#{Rails.env}"
  end
end

class Astronaut < ActiveRecord::Base
  include NasaConnection
end

class Shuttle < ActiveRecord::Base
  include NasaConnection
end

That might have been overkill. I could just as well have done this:

class Astronaut < ActiveRecord::Base
  establish_connection "nasa_#{Rails.env}"
end

class Shuttle < ActiveRecord::Base
  establish_connection "nasa_#{Rails.env}"
end

In both of these techniques, establish_connection is called twice. This works in Rails 3.1.8 and earlier, but starting in 3.2.0 you will get runtime errors. An example:

class Astronaut < ActiveRecord::Base
  establish_connection "nasa_#{Rails.env}"
  has_many :missions
  has_many :shuttles, through: :missions
end

class Shuttle < ActiveRecord::Base
  establish_connection "nasa_#{Rails.env}"
  has_many :missions
  has_many :astronauts, through: :missions
end

class Mission < ActiveRecord::Base
  establish_connection "nasa_#{Rails.env}"
  belongs_to :astronaut
  belongs_to :shuttle
end

# test
shuttle = Shuttle.create name: "Endeavor"
shuttle.astronauts << Astronaut.create(name: "Mark Kelly")
assert_equal 1, shuttle.reload.astronauts.count # FAIL

Different database adapters will give you different errors. (I tried SQLite and Postgresql.) The problem is in ActiveRecord, not in the adapters.

Trying to Find a Fix

The fact that this doesn’t work any more seems to me like a bug in ActiveRecord. So I used a debugger to step through the establish_connection code in Rails 3.1.8 and 3.2.8, trying to identify the problem. I found a comment above the definition of the ConnectionHandler class that gave me a hint:

# suppose that you have 5 models, with the following hierarchy:
#
# |
# +-- Book
# | |
# | +-- ScaryBook
# | +-- GoodBook
# +-- Author
# +-- BankAccount
#
# Suppose that Book is to connect to a separate database (i.e.
# one other than the default database). Then Book, ScaryBook
# and GoodBook will all use the same connection pool. Likewise,
# Author and BankAccount will use the same connection pool.
# However, the connection pool used by Author/BankAccount
# is not the same as the one used by Book/ScaryBook/GoodBook.

Ah! So maybe ActiveRecord requires you to create a common base class for all models that will talk to the same external database. So I tried it. Notice that Astronaut, Shuttle, and Mission are all subclasses of NasaTable, which is a subclass of ActiveRecord::Base:

class NasaTable < ActiveRecord::Base
  establish_connection "nasa_#{Rails.env}"
end

class Astronaut < NasaTable
  has_many :missions
  has_many :shuttles, through: :missions
end

class Shuttle < NasaTable
  has_many :missions
  has_many :astronauts, through: :missions
end

class Mission < NasaTable
  belongs_to :astronaut
  belongs_to :shuttle
end

# test
shuttle = Shuttle.create name: "Endeavor"
shuttle.astronauts << Astronaut.create(name: "Mark Kelly")
assert_equal 1, shuttle.reload.astronauts.count

This code, like my earlier code, worked great in Rails 3.1. But it still failed in Rails 3.2, only now for a different reason: ActiveRecord tried to connect to a table called ‘nasa_tables‘. I thought it must be associating all three subclasses with that table.

Hmm, easy but annoying to fix, right? Force Rails to use the right table names:

class Astronaut < NasaTable
  self.table_name = 'astronauts'
  ...
end

class Shuttle < NasaTable
  self.table_name = 'shuttles'
  ...
end

class Mission < NasaTable
  self.table_name = 'missions'
  ...
end

Nope, that still didn’t work. Rails was still looking for the nonexistent ‘nasa_tables‘ table. Looks like self.table_name= is not going to solve this particular problem.

MoonEventually I found a discussion of a Rails issue where someone was having this problem. In their example, A was the base class and B was the subclass. @tenderlove said, “Out of curiosity, why would you do this? If there is no A table, you should set the class to be abstract.”

Abstract? I know how to define abstract classes in other languages like C++ but I’ve never heard of an abstract class in Ruby. Well, it turns out he’s talking about an ActiveRecord concept, not a Ruby concept. Any subclass of ActiveRecord can declare itself abstract by setting self.abstract_class = true. This tells ActiveRecord that it shouldn’t look for a table to go with that class.

I Like Stuff that Finally Works

Armed with this information I finally landed on a solution that works in Rails 3.2. As you can see, 3.2 is much less flexible than 3.1 about how you define your class structure to talk to external databases. Whereas 3.1 had several techniques that would work, as far as I know the following is the only way to make it work in 3.2:

  1. Make a common base class for all models that need to talk to a non-default database.
  2. Tell ActiveRecord that this base class is abstract.
  3. Establish the connection in the base class.

So here’s our final code:

class NasaTable < ActiveRecord::Base
  self.abstract_class = true
  establish_connection "nasa_#{Rails.env}"
end

class Astronaut < NasaTable
  has_many :missions
  has_many :shuttles, through: :missions
end

class Shuttle < NasaTable
  has_many :missions
  has_many :astronauts, through: :missions
end

class Mission < NasaTable
  belongs_to :astronaut
  belongs_to :shuttle
end

# test
shuttle = Shuttle.create name: "Endeavor"
shuttle.astronauts << Astronaut.create(name: "Mark Kelly")
assert_equal 1, shuttle.reload.astronauts.count # IT WORKS

A Side Note About Connection Pools

It would be nice to have the flexibility that 3.1 had. But as long as there is a reasonable solution, I’m happy. I should mention, however, that if you call establish_connection multiple times with the same connection key (e.g. the old way where each table calls establish_connection), you end up creating multiple connection pools to the same database. This seems like a problem, as Sam Saffron has pointed out. If a single app talks to a single database through multiple connections, the best way to manage those connections is via a single pool–not multiple pools that don’t know about each other.

If you stick with the method I described above, where only one base class establishes a connection to each database, you can avoid this problem.

Astronaut

You Should Update One Gem at a Time with Bundler. Here’s How.

[Update: added example of updating the sextant gem, which causes Rails to be updated as well.]

Hey Ruby developers,

When you run bundle update to update your gems, it updates all of them at once. If your app stops working or your tests start failing, it can be pretty hard to figure out which gem update broke it.

There are a couple of solutions I’ve seen people use to solve this. Neither of them is that great:

  1. Lock the versions numbers in your Gemfile. But hey, that’s a pain and it’s what Gemfile.lock is for. Locking the version numbers in the Gemfile should be the exception, not the rule.
  2. Run bundle update gemname.

You might think bundle update gemname would just update that gem. But no, it also updates the gem’s dependencies—whether they have to be updated or not.  In fact, updating a third party gem can even upgrade you to a new version of Rails behind your back.

Here’s what the Bundler doc says about bundle update gemname:

UPDATING A LIST OF GEMS

Sometimes, you want to update a single gem in the Gemfile(5), and leave the rest of the gems that you specified locked to the versions in the Gemfile.lock.

For instance, in the scenario above, imagine that nokogiri releases version 1.4.4, and you want to update it without updating Rails and all of its dependencies. To do this, run bundle update nokogiri.

Bundler will update nokogiri and any of its dependencies, but leave alone Rails and its dependencies.

Yeah, sure, that sounds nice. But read that last sentence again: “Bundler will update nokogiri and any of its dependencies.” Well, heck. Some gems depend on a ton of other gems. And why not? Nothing wrong with that. And it makes sense to update the dependencies if something new is required for the update to work. But Bundler updates them no matter what. And now you’re back to the problem I stated at the beginning: even if you intend to update only a single gem, you still end up updating a whole boatload of gems. So when your app breaks or your tests fail, it takes a lot of time to figure out why.

Want an example of an unexpected side-effect of bundle update? I have a good one. Let’s say you’ve installed the sextant gem into your Rails app so you can see your Rails routes in development mode by navigating to /rails/routes. (It saves time compared to rake routes since the environment is already loaded.) In this example you are on Rails 3.2.2 and sextant 0.1.2. Now you run bundle update sextant to update to 0.1.3. Do you know what you just did? You upgraded Rails from 3.2.2 to 3.2.6. I don’t know about you, but I don’t like having my version of Rails updated just because I got the latest version of some little helpful gem.

When writing this blog post I tried a few scenarios. I found something else interesting:

With bundle update gemnameeven if there is no newer version of that gem, it will still update everything the gem depends on.

Here’s an example. My app has version 0.3.4 of haml-rails, which at the moment is the newest version. I run:

bundle update haml-rails

After that, git diff informs me that my Gemfile.lock now has newer versions of the journey, json, multi_json, and sprockets gems. Even though it didn’t find an update to haml-rails.

The Solution: bundle update ––source gemname

Bundler has a solution, but in my opinion it’s hard to understand the documentation.

Here’s what you do:

bundle update ––source gemname

I started diving into Bundler’s code and specs to see exactly what this does but it was taking more time than I wanted to spend. But I’ve been using this for more than a year and it works great. I recommend this be your default way to update a gem. If it doesn’t work due to a dependency conflict with other gems then you can always fall back on bundle update gemname.

As far as I can tell, using ––source is the equivalent of the following, but without all the work and headache:

  1. Specifying version numbers for everything in your Gemfile.
  2. When you want to update a gem, running gem list -r gemname to find out its latest version number.
  3. Changing the version number in your Gemfile for just that one gem.
  4. Running bundle install.
If you look at bundle update‘s documentation for ––source, it doesn’t help much. Here’s what it says:

OPTIONS

––source=<name>

The name of a :git or :path source used in the Gemfile(5). For instance, with a :git source of http://github.com/rails/rails.git, you would call bundle update ––source rails

By the way, the bundle update documentation does mention conservatively updating dependencies. Here is what it has to say: “For more information, see the CONSERVATIVE UPDATING section of bundle install(1) bundle-install.1.html.” The bundle install section on conservative updating does indeed describe how to update dependencies conservatively. However, this section assumes you are specifying version numbers in your Gemfile. As I mentioned above, I prefer to do that only as an exception.

See ya.

How to Find a Record in ActiveRecord Using either an ID Or an Object

Have you ever written written code in an ActiveRecord model where you wanted the caller to be able to pass in either an object or an id?

In this example you want to know if a user has read a post. Your join table, readings, tells you which users have read each post.

class Post
  has_many :readings

  def read_by?(user_or_id)
    readings.where(:id =>
                     Fixnum === user_or_id ?
                     user_or_id :
                     user_or_id.id).exists?
  end
end

Good news: you don’t have to go through all that. ActiveRecord checks for you if the object is a number or a model. You can simplify the code quite a bit.

In this rewritten version the user parameter of the read_by? method can be either a user object or a user id:

class Post
  has_many :readings

  def read_by?(user)
    readings.where(:id => user).exists?
  end
end