Caius Theory

Now with even more cowbell…

Find dependencies blocking rails upgrades

The initial pain point when upgrading a rails app is figuring out which of your dependencies are blocking you upgrading the actual rails gem (& immediate dependencies, actionpack, etc.). One way to start this is to update the rails dependency in your Gemfile and run bundle update rails. Then check the error output (it never works first time…) to see which gems are blocking the upgrade. Repeat, rinse until it works.

I figured I'd cheat a little and eyeball the Gemfile.lock to see which gems had an explicit dependency pinning rails (or actionpack, activejob, etc) to a version lower than I want to upgrade to, so I could get an idea of what needs to be upgraded without having to do them all one-by-one.

Then instead of eyeballing Gemfile.lock, I wrote an awk script to pull out the interesting dependencies (ie, anything that depends on rails gems) so I just have to check which versions they depend on by hand.

# Reads a Gemfile.lock and outputs all dependencies that depend on rails

  parent = 0
  parent_printed = 0
  rails_gems = "^(rail(s|ties)|action(mailer|pack|view)|active(job|model|record|support))$"

# We only want the specs from the GEM section
NR == 1, $1 ~ /GEM/ { next }
$1 == "" { exit }

# Skip parent gems we don't care about (rails itself…)
$0 ~ /^ {4}[^ ]/ &&
$1 ~ rails_gems {
  parent = 0
  parent_printed = 0

# Parent gems that aren't part of rails core
# Store the name to be printed if we match below
$0 ~ /^ {4}[^ ]/ {
  parent = $0
  parent_printed = 0

# If the nested gem (6 space prefix) matches rails-names and we have a parent value
# set then we print them out - making sure to only print the parent once
$0 ~ /^ {6}[^ ]/ &&
$1 ~ rails_gems &&
parent != 0 {
  if (parent_printed == 0) {
    parent_printed = 1
    print parent

  print $0

Run it against your Gemfile.lock for the app you're upgrading:

awk -f rails5.awk Gemfile.lock

And you'll get output like this, to run through and see if any of the dependencies are pinning to lower versions than you need.

    coffee-rails (4.0.1)
      railties (>= 4.0.0, < 5.0)
    factory_girl (4.4.0)
      activesupport (>= 3.0.0)
    factory_girl_rails (4.4.1)
      railties (>= 3.0.0)
    globalid (0.3.7)
      activesupport (>= 4.1.0)
    google-api-client (0.8.6)
      activesupport (>= 3.2)
    jquery-rails (3.1.4)
      railties (>= 3.0, < 5.0)
    jquery-ui-rails (5.0.5)
      railties (>= 3.2.16)
    rails-deprecated_sanitizer (1.0.3)
      activesupport (>= 4.2.0.alpha)
    rails-dom-testing (1.0.7)
      activesupport (>= 4.2.0.beta, < 5.0)
    rspec-rails (3.4.2)
      actionpack (>= 3.0, < 4.3)
      activesupport (>= 3.0, < 4.3)
      railties (>= 3.0, < 4.3)
    sass-rails (4.0.5)
      railties (>= 4.0.0, < 5.0)
    sprockets-rails (2.3.3)
      actionpack (>= 3.0)
      activesupport (>= 3.0)

In this case, I'm trying to take this app to rails 5.0, so all the ones specifying < 5 and < 4.3 need upgrading beforehand.

#to_param and keyword slugs

Imagine you've got a blogging app and it's currently generating URL paths like posts/10 for individual posts. You decide the path should contain the post title (in some form) to make your URLs friendlier when someone reads them. I know I certainly prefer to read vs (That's a fun blog post if you're into (ab)using ruby occasionally!)

Now you know all about how to change the URL path that rails generates—just define to_param in your app. Something simple that generates a slug consisting of hyphens and lowercase alphanumerical characters. For example:

# 70-abusing-ruby-1-9-json-for-fun
def to_param
  "#{id}-#{title.gsub(/\W/, "-").squeeze("-")}".downcase

NB: You might want to go the route of storing the slug against the post record in the database and thus generating it before saving the record. In which case the rest of this post is sort of moot and you just need to search on that column. If not, then read on!

Now we're generating a nice human-readable URL we need to change the way we find the post in the controller's show action. Up until now it's been a simple @post = Post.find(params[:id]) to grab the record out the database. Problem now is params[:id] is "70-abusing-ruby-1-9-json-for-fun", rather than just "70". A quick check in the String#to_i docs reveals it "Returns the result of interpreting leading characters in str as an integer base base (between 2 and 36)." Basically it extracts the first number it comes across and returns it.

Knowing that we can just lean on it to extract the id before using find to look for the post: @post = Post.find(params[:id].to_i). Fantastic! We've got nice human readable paths on our blog posts and they can be found in the database. All finished… or are we?

There's still a rather embarassing bug in our code where we're not explicitly checking the slug in the URL against the slug of the Post we've extracted from the database. If we visited /posts/70-ruby-19-sucks-and-python-rules-4eva it would load the blog post and render it without batting an eyelid. This has caused rather a few embarrassing situations for some high profile media outlets who don't (or didn't) check their URLs and just output the content. Luckily there's a simple way for us to check this.

All we want to do is render the content if the id param matches the slug of the post exactly, and return a 404 page if it doesn't. We already know the id param (params[:id]) and have pulled the Post object out of the database and stored it in an instance variable (@post). The @post knows how to generate it's own slug, using #to_param.

So we end up with something like the following in our posts controller, which does all the above and correctly returns a 404 if someone enters an invalid slug (even if it starts with a valid post id):

def show
  @post = Post.find(params[:id].to_i)
  render_404 && return unless params[:id] == @post.to_param

def render_404
  render :file => Rails.root + "public/404.html", :status => :not_found

And going to an invalid path like /posts/70-ruby-19-sucks-and-python-rules-4eva just renders the default rails 404 page with a 404 HTTP status. (If you want the id to appear at the end of the path, alter to_param accordingly and do something like params[:id].match(/\d+$/) to extract the Post's id to search on.)

Hey presto, we've implemented human readable slugs that are tamper-proof (without storing them in the database.)

(And bonus points if in fact you spotted I used my blog as an example, but that it isn't a rails app. (Nor contains the blog post ID in the pretty URL.) It's actually powered by Habari at the time of posting!

Adding XHTML output validation to Cucumber stories

At the 2009 Barcamp Leeds I attended a talk by Neil Crosby where he talked about automated testing, and about how he felt there was a gap in everything that people were testing. Everyone has unit tests, and people are doing full stack testing too, but no-one (so he feels) does XHTML/CSS/JS validation as part of their automated test suite. And certainly from what I've seen on the mainstream Ruby site's about testing, I agreed with him.

So after his talk I had a quick look at his frontend test suite, and started wondering where exactly I would fit frontend validation testing into my workflow. Would it be part of my unit tests (RSpec), or part of the full stack tests (Cucumber)? As you've probably guessed by the title of this post, its ended up going into my cucumber tests. Since the initial play its been something I've mused about occasionally, but not something I've actively looked into how to implement as part of my test workflow.

Fast-forward a few weeks from Barcamp Leeds and I see a news article in my feed reader entitled "Easy Markup Validation" which gets me hopeful someone's solved this frontend validation thing easily for Rubyists. A quick read through and I'm sold on it and installing the gem. Opened an existing project I'm working on which has a fairly extensive test suite (both unit tests & full stack tests) and tried to slot the validation into my controller unit tests.

Problem with doing this is by default RSpec-rails doesn't generate the views in your controller specs. At that point I realised I was already generating the full page when I was doing a full stack test using culerity and cucumber. So why not just add a cucumber step in my stories to validate the HTML on each page I visit? Mainly because its not enough of a failure for this app to have invalid XHTML markup. Having valid markup would be nice, but I'd rather have it as a separate test to my stories in some way.

Currently I just do that by only validating if ENV["VALIDATION"] is set to anything, so a normal run of my cucumber stories will just test the app does what its supposed to do. If I run them with VALIDATION=true then it will check my markup is valid as well.


require "markup_validity" if ENV["VALIDATION"]


Then %r/the page is valid XHTML/ do
  $browser.html.should be_xhtml_strict if ENV["VALIDATION"]


Feature: Logging in
  In order to do stuff
  As a registered user
  I want to login

  Scenario: Successful Login
    Given there is a user called "Caius"

    When I goto the homepage
    Then the page is valid XHTML

    When I click on the "Login" link
    Then I am redirected to the login page
    And the page is valid XHTML

    When I enter my login details
    And I click "Login"
    Then I am redirected to my dashboard
    And the page is valid XHTML

Now when I run cucumber features/logging_in.feature, it doesn't validate the HTML, it just makes sure that I can login as my user and that I am redirected to the right places. But if I run VALIDATION=true cucumber features/logging_in.feature, then it does validate my XHTML on the homepage, the login page and on the user's dashboard. If it fails validation then it gives you a fairly helpful error message as to what it was expecting and what it found instead.

From a quick run against a couple of stories in my app I discovered that I've not been wrapping form elements in an enclosing element, so they've been quickly fixed and now they validate. Now I realise this gem is only testing XHTML output, and doesn't include CSS or JS validation, but from a quick peek at the gem's source it should be fairly easy to add both of those in I think, although again they aren't major errors for me yet in this app.

Validating Data with Regular Expressions in Ruby

I happened to be sent a link to the OWASP paper on Rails Security recently and started reading it. Partway in there's a section on Regular Expressions, which opens with the following line:

A common pitfall in Ruby's regular expressions is to match the string's beginning and end by ^ and $, instead of \A and \z.

Now I've never used \A and \z in my regular expressions to validate data, I've only ever used ^ and $ assuming they matched the start and end of the string. This becomes an issue with validating data in rails, because %0A (\n URL encoded) is decoded by rails before passing the string to your model to validate.

Testing our expectations

Lets say we want to validate the string as a username for our app. A username is 5 characters long and consists only of lowercase letters.

regex = /^[a-z]{5}$/

First we make sure it matches the data we want it to:

"caius".validate(regex) # => true

Excellent, that validated. Now we'll try a shorter string, which we expect to fail.

"cai".validate(regex) # => false

Once more, it behaves how we expected it to. The shorter string was rejected as we wanted it to be. Now, what happens if we test a string with a newline character in it? We'll make sure the data before the \n is valid, and then add some more data after the newline.

"caius\nfoo".validate(regex) # => true

Uh oh! That validated and would've been saved as a username?!

Lets have a look at exactly what's happening there, the $ matches the \n character, so the regex is only matching the first 5 characters of the string, and just ignores anything after the \n. As it turns out, this is exactly what we've asked the regex to match, but we didn't want this behaviour.

Now you might be thinking, "So what? someone can have a username with a newline in it." For starters this will probably display weirdly anywhere you use their username, but more importantly it opens your application to an injection attack. Suppose they took advantage of this by setting their username to include some javascript on the page which stole your login cookie and sent it to them. You view their account in the admin section and oh no! They can login as your admin account and do what they want.

Simple example of this is just having it output an alert dialog. (This is actually the code I'll use to test an application as its not malicious, but blindingly obvious if the javascript is executed or not.)

"caius\n<script>alert('hello')</script>".validate(regex) # => true

Ok, so that was the result we were expecting this time, although it's still not the outcome we wanted. Anytime their username is viewed (providing you aren't escaping the data to HTML entities) you'll see the following:

javascript alert dialog

The Solution

Having realised from our testing above that ^$ matches the beginning/end of a line in ruby not the beginning and end of a string, I hear you cry, "How do we make sure we're matching the entire string?!"

The answer is pretty simple. Just swap out ^$ for \A\z. Lets go ahead and try this with the same data as we have above, but with the modified regular expression.

new_regex = /\A[a-z]{5}\z/
"caius".validate(new_regex) # => true

That's a good start, the valid string still matches.

"cai".validate(new_regex) # => false

Looks like it's going well, invalid string is invalid.

"caius\nfoo".validate(new_regex) # => false

Oh Excellent! It's validating this one correctly now.

And just for consistency, lets test it with a more likely attack string.

"caius\n<script>alert('hello')</script>".validate(new_regex) # => false

Fantastic! We've fixed the security hole in our validation of the user's username.

If you want to actually run the code above you'll need the following at the start of the ruby script to patch the validate method into String.

class String
  def validate regex

Update: I had \Z in the new_regex rather than the \z it should've been. Thanks Ciarán.

Install Mysql Gem on Leopard

So, I keep having to reinstall mysql5 and rubygems from time to time for various reasons. I always install mysql5 through MacPorts as a dependency for the php5 port (along with various other bits for the LA*P stack).

sudo port install php5 +mysql5 +pear +readline +sockets +apache2 +sqlite

Once this is installed then I have mysql and can setup my databases, etc.

Ignoring the rest of the LAMP stack, I then need to connect Ruby to the Mysql I just installed through MacPorts. Its quite simple to do, once you know the right argument to pass to it. The easiest way is to just tell it where the mysql5_conf file is and let it figure out the rest for itself.

sudo gem install mysql -- --with-mysql-config=/opt/local/bin/mysql_config5

Hopefully this will save me 10 minutes of googling next time I need to do this!

Update 2009-01-21

I'm an idiot and typed the gem install command by hand, and ended up with --with-mysql-conf instead of --with-mysql-config. Updated now.

Update 2009-10-19

On Snow Leopard I needed to tell rubygems to install the gem as a 64-bit binary. Hattip to Philipp

sudo env ARCHFLAGS="-arch x86_64" gem install mysql -- \

Setting up git with rails apps

When I create a new rails app, I'm constantly going back to another project and stealing the .gitignore file from it to make sure that git doesn't know about certain files rails either updates frequently, or stores machine-specific data in. The latter is generally just config/database.yml, because I develop alongside my colleagues at Brightbox and we deploy via capistrano, we always put the database.yml file in the shared directory on the server, so we each have our own version with our local credentials in it locally. And thus we don't want it to be tracked by git.

Here's what I've collated from various sources over the few weeks I've been using git + rails everyday.



# OS X only

Then to make sure log/ and tmp/ are tracked, convention is to add a blank .gitkeep file in them.

touch log/.gitkeep
touch tmp/.gitkeep

Create a blank rails app including plugins

When I create a rails app from scratch I like to include certain plugins to help me write the app, such as the Rspec testing framework instead of the built-in Test::Unit and jQuery instead of prototype.

And here are the commands in the order I run them to create the blank app.

# Create the rails app
cd ~/Sites/apps/
rails myapp
cd myapp

# Setup a git repo
git init
# Add all files and make the initial import
git add .
git commit -m "Initial Import"

# Add the plugins as git submodules
git submodule add git:// vendor/plugins/rspec
git submodule add git:// vendor/plugins/rspec-rails
git submodule add git:// vendor/plugins/cucumber
git submodule add git:// vendor/plugins/webrat
git submodule add git:// vendor/plugins/demeters_revenge

# Commit the changes
git ci -am "Adding all needed submodules"

# Replace TestUnit with rspec
git rm -r test/
ruby script/generate rspec
# Replace stories with cucumber features
rm -rf stories/
ruby script/generate cucumber

# Add the changes to git
git add .
git ci -m "Committing initial rspec/cucumber files"

# Install jRails, we have to install it using script/plugin
# Remove existing javascript files
git rm public/javascripts/*
mkdir public/javascripts
# Add jrails
ruby script/plugin install
git add vendor/plugins/jrails/ public/javascripts
git ci -m "Adding jRails to replace Prototype"

And now you have a blank app waiting for you to write using features for full stack testing, and rspec for testing model and controller code.

Updated 2008-11-04

Added demeters revenge and jRails plugins.

Update 2008-11-05

I've also blogged the .gitignore file I use with rails apps as well. Usually add it into my apps before running git init