Put HTML tags and apostrophes in fixtures and tests or a meanie will hack you.

Here’s a good way to protect against cross-site scripting attacks and SQL injection attacks. This will help catch mistakes where you (well actually your teammate, since you’re perfect) forgot to call “h” in a <%= %> block, or accidentally passed a SQL statement to the database without escaping the values:

Sprinkle unclosed HTML tags and apostrophes all over your fixture data and test code.

Then use assert_select liberally, which will barf on the console if it sees unclosed HTML tags–even if you were selecting some other part of the document.

I Like Stuff that’s Safe

Here is what a posts.yml file might look like:

test_post:
id: 1
  subject: <script> attack!
  detail: "sql injection: '; drop table posts;"

(If you use an apostrophe in YAML you have to quote the whole string.)

So assert_select has this handy side-effect I mentioned where it tells you about your malformed HTML. Since Rails tests don’t actually run in a browser, you need some other way to know that you’ve forgotten to escape data. Unclosed HTML tags in your fixtures, yeah, that’s the ticket.

And remember, you don’t need to call assert_select on the element that contains the bad data. Just call assert_select on anything and it will parse the output to make sure it’s well-formed.

  def test_show
    post = posts(:test_post)
    get :show, post.id
    assert_select "body"
  end

The idea is that by sprinkling XSS attacks through your fixtures and using assert_select whenever you’re testing other stuff, the XSS attacks will become apparent.

If you do need to assert that the output is correct, you can call CGI::escapeHTML:

  def test_show
    post = posts(:test_post)
    get :show, post.id
    assert_select "span", :count => 1,
      :text => CGI::escapeHTML(post.detail)
  end

I can’t haz SQL injection attacks

I admit that putting SQL injection attacks in the fixtures is a bit contrived and may not help. A better way to catch SQL injection attacks is to pass apostrophes into the app from your test code, so go ahead and sprinkle your test code with beauties like this:

  def test_update
    post :update, posts(:test_post).id,
      :detail => "sql injection: '; drop table posts;"
  end

The secret to making this work is:

  1. apostrophe
  2. semicolon
  3. SQL statement
  4. another semicolon

You want to use a SQL statement that will cause a test to fail. It would be coolio if there were some way to make the current test succeed and subsequent tests fail, but I’m not sure I know a way to do that consistently. But at least if you use a “drop table” statement, you’re going to cause subsequent tests to fail (if there are any subsequent tests that use that table) because a schema change does not happen in a transaction. So even if you’re using transactional fixtures, the next test will fail anyway cuz the dang table is gone.

Fun with Ruby’s instance_eval and class_eval

In an attempt to better understand instance_eval and class_eval, I just read Khaled’s post on Ruby reflection. It helped, and I came up with a memory crutch I can use to remember when to use each of them:

Use ClassName.instance_eval to define class methods.

Use ClassName.class_eval to define instance methods.

That’s right. Not a typo. Here are some examples, shamelessly stolen from his post:

# Defining a class method with instance_eval
Fixnum.instance_eval { def ten; 10; end }
Fixnum.ten #=> 10

# Defining an instance method with class_eval
Fixnum.class_eval { def number; self; end }
7.number #=> 7

I Like Stuff that’s Backwards

Why is it the reverse of what you might expect? Because Fixnum.instance_eval treats Fixnum as an instance (an instance of the Class class), thus any new functions you define can be called on that instance. So it’s equivalent to this:

class Fixnum
  def self.ten
    10
  end
end
Fixnum.ten #=> 10

Fixnum.class_eval treats Fixnum as a class and executes the code in the context of that class, thus any “def” statements are treated exactly as if they were in normal code without any reflection. It’s equivalent to this:

class Fixnum
  def number
    self
  end
end
7.number #=> 7

There are still some things about Ruby reflection that mystify me but at least I think I’ve got this one nailed.

Generate guid ids 2100x faster for ActiveRecord models (but only if you use MySQL)

The Rails project I’m working on (the Small Business Help Forums at the Intuit Community) has some tables that use GUIDs for their primary keys instead of autoincrement integers. To implement GUIDs we used the handy usesguid plugin. All you have to do is change your “id” column to a 22-character varchar (make sure it’s a binary varchar and uses binary collation, so upper and lower case are treated differently) and put this in your model:

class MyModel < ActiveRecord::Base
  usesguid
end

Pretty nice.

Just one problem.

It’s HECKA slow.

On my Windows machine it was taking a whopping 0.4 seconds to create a GUID with this plugin. On my Linux VM it was a lot faster, but still slower than it should be (0.0322 seconds–just 31 GUIDs per second).

Download the Faster Plugin

If you use MySQL for your database and you’d like to download my modified usesguid plugin which is way faster, type this from the main directory of your Rails app:

 script/plugin install git://github.com/BMorearty/usesguid.git

Or download it here and copy it into vendor/plugins/usesguid.

Then add the “usesguid” statement (see above) to any models that you want to have guid ids, migrate the id columns to binary varchar(22), and add this to your environment.rb file:

ActiveRecord::Base.guid_generator = :mysql

Here is a sample migration for creating a new table with guids, as opposed to changing an existing one to use them:

create_table :products, :id => false, :options => 'ENGINE=InnoDB' do |t|
  # This table uses guid ids
  t.binary :id,   :limit => 22, :null => false
  t.string :name, :limit => 50, :null => false
end
# Since the t.column syntax can't specify a character set and collation...
execute "ALTER TABLE `products` MODIFY COLUMN `id` VARCHAR(22) BINARY CHARACTER SET latin1 COLLATE latin1_bin NOT NULL;"
execute "ALTER TABLE `products` ADD PRIMARY KEY (id)"

I Like Stuff that’s Fast

Read on to find out why the old code was so slow, and how the code got 2100 times faster.

I investigated to see why it takes so long, and found that every time it creates a GUID, it calls UUID.timestamp_create. This in turn calls UUID.get_mac_address, which spawns a new process (ipconfig on Windows; ifconfig on UNIX-based systems) and parses the output. The reason: to discover the network card’s MAC address. (Hey yeah, even Windows has a MAC address.)

But the MAC address never changes. It’s hard-wired into the network card. So why bother querying it every time you create a GUID? Launching a whole new process every time we need a GUID is overkill.

My first thought was to write a plugin on top of the plugin. My plugin would cache result of UUID.get_mac_address. I tried it, but found a problem: there’s a bug in UUID.timestamp_create. If it executes too quickly on a system whose clock resolution is not high enough, it returns the same GUID multiple times in a row. Whoops! Kind of defeats the purpose of GUIDs.

So I decided to take advantage of the fact that MySQL has a “SELECT UUID()” syntax, and I wrote a new GUID creator in the UUID class that calls MySQL to generate GUIDs. (Obviously this only works if you have MySQL.) I called this new creator “UUID.mysql_create.” The first time it is called, it calls MySQL like this:

SELECT UUID(), UUID(), UUID(), UUID(), UUID(), ... ;

It selects 50 UUIDs in a single round-trip to the database and stores the results in memory. Each time a new GUID is required, it plucks one off the list. When the list is empty and another one is required, it goes and gets another 50.

On my Windows machine, creating a GUID with UUID.mysql_create now takes 0.0001937 seconds, which is over 2100 times faster than the 0.4 seconds it used to take. On my Linux VM it’s 0.0001671 seconds, or 193 times faster than the 0.0322 seconds it used to take.

All these changes were made in a new file, uuid_mysql.rb. But I also made a number of changes to the usesguid.rb file:

  1. Added a configuration option so you can specify which creator to use. The default is still timestamp_create, but to use mysql_create you just put “ActiveRecord::Base.guid_generator = :mysql” in your environment.rb file.
  2. Fixed the code so it respects the :column option, which lets you override the column that stores the primary key.
  3. Delayed the assignment of a guid until just before creation (before_create) rather than just after “new” (after_initialize). This has two benefits:
    1. It more closely mimics the default behavior of autoincrement columns, which doesn’t assign an id until after creation
    2. It is faster. After_initialize gets called every time a model object is instantiated, including all objects return by a call to find. (But don’t worry, it wasn’t generating GUIDs for all those objects; it was just being called and bailing out when it saw there was already an id).  Before_create only gets called for newly created model objects.

I thought about making it even faster by calling CoCreateGuid() on Windows and calling a UNIX C function to create a GUID when on UNIX, but it’s so fast now that it hardly seemed worth the extra effort and the extra platform-specific code.

So that’s it. Enjoy it!