Export iPhoto to Folders is Getting Better All the Time

Mark Nottingham is on a tear. Last week he sent me a pull request for a bunch more changes he made to the exportiphoto app that lets you export iPhoto events or albums to folders.

The most notable changes:

  • It uses less memory. This matters if your iPhoto library is huge, because you will no longer overwhelm your Mac’s memory when exporting the photos to folders. It now uses a SAX parser so it no longer has to load the entire iPhoto XML file into memory.
  • It shows better progress than before, telling you how many events/albums it has processed so far and how many there are total.
  • If you want, it can copy photo metadata (image name, description, keywords, faces, and rating) into the exported photos.

Ruby on Rails Bedtime Stories

It’s time for some bedtime stories. These are children’s stories so there is a happy ending. These stories are about making your Rails code cleaner, faster, and less brittle, and about keeping your fragment caching code simple. The third story shows how removing code from a view can make your app’s controller query the DB a lot more often.

Story #1: If You Fragment Cache a View

If you want your Rails website to run faster and scale better, you might decide fragment caching will help.

And chances are, you will put memcached on the server and write cache statements in your views.

But if you are still on Rails 2.x then you’re not using Arel’s lazy execution. So you might notice your controllers are still querying the database even for views that are cached and won’t use the data. So you will probably add read_fragment calls to the controllers to prevent that.

If you used a custom key in the view’s cache statement you will want to use the same custom key in the read_fragment statement.

But if it’s a complex key, that might not feel DRY. So you might refactor the key out to a helper function you can call from the controller and the view, passing the result in to read_fragment and cache. You might even put the helper in application_controller.rb if the view is a partial that’s called from several controllers.

Then you might feel bad that your fragment cache logic, which started out nice and clean and small in the view, is now spread out among three source files. You might get a headache. You might think there must be a better way. And you might ask for a glass of milk.

Story #2: If You Remove Data from a View

If your product manager wants you to output some data in a view, you might ask for the data in your controller. You might use ActiveRecord’s find method since you’re still on Rails 2.x and can’t use Arel yet.

And chances are, if you query data in your controller, you’re going to store the result in an instance variable.

If your product manager decides three months later that the data doesn’t need to be shown after all, she may ask your coworker to remove it. Your coworker may do just that–remove it from the view.

Your coworker may not realize the query in the controller is no longer needed. He may not even think of removing it.

Your server may worker harder than it has to, continuing to query the database on every page view to get data that’s never shown on the page.

Your server might get a headache. It might ask for a glass of milk.

Story #3: If You Combine Stories 1 and 2

If your product manager asks you to show data in a view, chances are you will put a database query in your controller to pass the data to the view.

When traffic gets high and your server gets overloaded, you may decide to use a fragment cache. You might use :expires_in => 5.minutes so your server only has to query the database once every five minutes.

You might get the same headache you got in bedtime story #1 because you have to add a read_fragment call in your controller, a cache call in your view, and a reusable key function in your application controller. But the glass of milk might help and your headache may go away.

When your product manager says you no longer need to show the data in your view, chances are you will remove the fragment cache and its contents from the view. But you might forget about the query in the controller. You might leave it there.

But no big deal, right? The app still works. Your tests still pass. You’ll only be querying the database once every five minutes, after all. Wrong. Remember: calling cache from the view is what writes to the cache. By removing the view’s cache statement you have removed all cache writes. So there’s never anything in the cache. So read_fragment will always return nil.

And if read_fragment returns nil, your query will run.

On. Every. Request.

By removing the data from the view you have gone from querying the database once every five minutes to querying it on each and every page view, even though you never need to show the data.

You’re going to need a really big glass of milk to get rid of that headache. Hopefully you’re running New Relic and will see the huge increase in DB queries immediately after deploy so you can look into it quickly.

The Promised Happy Ending

There are several possible solutions to the problems in these three bedtime stories. The third one is my favorite by a long shot:

  1. Not the solution: perform the query in the view instead of in the controller. Obviously this is wrong. I won’t discuss this further.
  2. Not the solution: use Model-View-Presenter. I won’t discuss it either because I think it needlessly complicates matters. We Rails programmers tend to believe in the KISS principle, and there is a simpler solution.
  3. The solution: move your queries out of the action methods and into memoized helper methods in your controller.
    1. Give the helper methods the same name that the instance variables used to have. (@company becomes the company method).
    2. In each helper method, assign the result to the instance variable. Make sure the assignment uses ||=, not plain old =, so the helper runs the query only once no matter how many times it’s called.
    3. In your view, never rely on instance variables having been previously set by the controller. Remove all the @ signs from the views and almost all @ signs from the controllers. (Exceptions are in new and create. See below.) What you’re left with are function calls to the helpers. Each helper will perform the query the first time it is called within a single request.

This solution is pure beauty. Here’s why:

  • It solves bedtime story #1. You can remove all those sloppy read_fragment calls from your controller. The only reason they were there was to limit database calls to only when the view needs it. But by the time your helper function is called, you know the view needs the data and wasn’t able to find a fragment in the cache. So just go get the data. This means you can also remove the reusable key method from application.rb and put the cache key inline as a parameter to the cache statement. Your code will still be DRY because you only need the key in that one place.
  • It solves bedtime story #2. If you ever remove your data output from your view, you are removing calls to the helper functions. If they’re not called, they won’t query the database. So even if you forget to remove them from the controller, they’re doing no harm.
  • It solves bedtime story #3. If you ever remove a fragment cache and its contents from the view and thus never write to the cache any more, it does not increase the number of times your controller queries the database. Since you’ve removed the view code that calls the helper functions, you actually reduce the number of database calls to zero.
  • It’s simple to understand.
  • It’s easy to implement.
  • The resulting code is clean. Your controller functions are shorter and therefore sweeter. You don’t have to add new classes. You don’t have to write any new code.
  • It does not require putting database calls in the wrong layer (the view).
  • It does not require explaining a scary new pattern name to your team and to every new person who joins your team.

Does It Work in Rails 3?

Sure, it works in Rails 3. You will get the same benefits from using it in any version of Rails. Rails 3 has Arel with its lazy evaluation of queries, but the default generated controllers don’t take advantage of this feature.

Update: previously I said it won’t really be needed any more after you update all your find calls to Arel’s implicit query that runs when you start iterating over results. It turns out I was wrong: even in Rails 3 the default index function calls ModelClass.all and the other functions call ModelClass.find. Both of these run an immediate query, not a lazy one.

Putting performance improvements aside, this technique also DRYs up your controller code a bit even when you’re using Arel. Instead of putting @company = Company.find(params[:id]) in every controller method, you just write it once.

It’s Not Following the Rails Conventions. Will Other Gems Work?

Gems will still work. Let’s say, for example, you use Ryan Bates’ CanCan gem. There’s a code sample in the CanCan README that goes like this:

def show
  @article = Article.find(params[:id])
  authorize! :read, @article

It looks like you have to initialize @article, right? But you could rewrite it this way:

def show
  authorize! :read, article

def article
  @article ||= Article.find(params[:id])

Notice I took the @ off the “article” parameter. It is now a function call to the your helper method that calls Article.find. In this case you’re not avoiding a database call, but you’re not doing any harm either. And your show method is shorter. And sweeter.

Update: In the comments below, in response to Ben’s note about CanCan’s load_and_authorize_resource function, I posted a better way to write your helpers when using CanCan so they don’t run the query or do any authorization until we know the data is needed.

What About Non-Read Operations? Create, Update, Delete?

The new and create methods should still set the instance variable like they did before. update can just call the helper (company.update_attributes instead of @company.update_attributes) and destroy can do the same (company.destroy instead of assigning @company and then calling @company.destroy).

But all views—whether for any of the seven RESTful methods or any non-RESTful method—should just reference the helper method, never the instance variable. Since the helper method is memoized (@company ||= Company.find), it will just work because the instance variable was already set.

I Like Testing

Your tests will need to use @controller.post instead of assigns(:post). Understand, though, that the test might do a DB query by calling post if the view never called it.

Code Sample: the Old Way

Here’s an example CompaniesController that follows the Rails convention of doing the database query in the controller, assigning the result to an instance variable, and referencing it in the view:

# GET /companies/1
# GET /companies/1.xml
def show
  @company = Company.find(params[:id])

  respond_to do |format|
    format.xml  { render :xml => @company }

And the corresponding show.html.erb:

  <%=h @company.name %>

<%= link_to 'Edit', edit_company_path(@company) %> |
<%= link_to 'Back', companies_path %>

Code Sample: the Better Way

Here’s how you can rewrite it to avoid bedtime story problems:

helper_method :company

# GET /companies/1
# GET /companies/1.xml
def show
  respond_to do |format|
    format.xml  { render :xml => company }

def company
  @company ||= Company.find(params[:id])

And the corresponding show.html.erb:

  <%=h company.name %>

<%= link_to 'Edit', edit_company_path(company) %> |
<%= link_to 'Back', companies_path %>

For completeness, here is how you would implement the index method:

helper_method :company, :companies

def index

def companies
  @companies ||= Company.all
  # or with will_paginate:
  # @companies ||= Company.paginate :page => params[:page] || 1


Thanks and apologies to If You Give a Mouse a Cookie and other books in the series by Laura Joffe Numeroff and Felicia Bond. A wonderful series of books that I recommend reading to small children.

Thanks to Struts 2 for helping me discover this pattern. (Yeah. Whoa.)