Adding XHTML output validation to Cucumber stories

- 2009-06-16 10:19:11

At the 2009 Barcamp Leeds I attended a talk by Neil Crosby where he talked about automated testing, and about how he felt there was a gap in everything that people were testing. Everyone has unit tests, and people are doing full stack testing too, but no-one (so he feels) does XHTML/CSS/JS validation as part of their automated test suite. And certainly from what I've seen on the mainstream Ruby site's about testing, I agreed with him.

So after his talk I had a quick look at his frontend test suite, and started wondering where exactly I would fit frontend validation testing into my workflow. Would it be part of my unit tests (RSpec), or part of the full stack tests (Cucumber)? As you've probably guessed by the title of this post, its ended up going into my cucumber tests. Since the initial play its been something I've mused about occasionally, but not something I've actively looked into how to implement as part of my test workflow.

Fast-forward a few weeks from Barcamp Leeds and I see a news article in my feed reader entitled "Easy Markup Validation" which gets me hopeful someone's solved this frontend validation thing easily for Rubyists. A quick read through and I'm sold on it and installing the gem. Opened an existing project I'm working on which has a fairly extensive test suite (both unit tests & full stack tests) and tried to slot the validation into my controller unit tests.

Problem with doing this is by default RSpec-rails doesn't generate the views in your controller specs. At that point I realised I was already generating the full page when I was doing a full stack test using culerity and cucumber. So why not just add a cucumber step in my stories to validate the HTML on each page I visit? Mainly because its not enough of a failure for this app to have invalid XHTML markup. Having valid markup would be nice, but I'd rather have it as a separate test to my stories in some way.

Currently I just do that by only validating if ENV["VALIDATION"] is set to anything, so a normal run of my cucumber stories will just test the app does what its supposed to do. If I run them with VALIDATION=true then it will check my markup is valid as well.

features/support/env.rb

require "markup_validity" if ENV["VALIDATION"]

features/step_definitions/general_steps.rb

Then %r/the page is valid XHTML/ do
  $browser.html.should be_xhtml_strict if ENV["VALIDATION"]
end

features/logging_in.feature

Feature: Logging in
  In order to do stuff
  As a registered user
  I want to login

  Scenario: Successful Login
    Given there is a user called "Caius"

    When I goto the homepage
    Then the page is valid XHTML

    When I click on the "Login" link
    Then I am redirected to the login page
    And the page is valid XHTML

    When I enter my login details
    And I click "Login"
    Then I am redirected to my dashboard
    And the page is valid XHTML

Now when I run cucumber features/logging_in.feature, it doesn't validate the HTML, it just makes sure that I can login as my user and that I am redirected to the right places. But if I run VALIDATION=true cucumber features/logging_in.feature, then it does validate my XHTML on the homepage, the login page and on the user's dashboard. If it fails validation then it gives you a fairly helpful error message as to what it was expecting and what it found instead.

From a quick run against a couple of stories in my app I discovered that I've not been wrapping form elements in an enclosing element, so they've been quickly fixed and now they validate. Now I realise this gem is only testing XHTML output, and doesn't include CSS or JS validation, but from a quick peek at the gem's source it should be fairly easy to add both of those in I think, although again they aren't major errors for me yet in this app.

Quantum Javascript Bug

- 2009-06-04 15:12:24

So I've got some js I've written to update a couple of <select> lists in a form, and it was all working fine for me (under Safari.) John happened to mention it wasn't working for him under Firefox, so I fired up Firefox and took a look. Could reproduce it perfectly, changing the first popup was populating the second one, but then wasn't selecting the right value from the list.

Having no idea what was happened I figured I'd enable firebug and watch it execute to figure out what was happening. Enabled firebug, reloaded the page, selected from the first popup… and voila! It updated the second one and selected the correct row! WTF!!!

Turned firebug off and it didn't work, turned it back on and it worked. Figured it might be something buggy in the Firefox 3.0.5 js runtime, so I grabbed a copy of the new beta 3.5 and tried it in there—still failed to update the page as it should.

Then started poking around the javascript code, the function that was seemingly failing to run was being triggered by a setTimeout() call set to 1 second. We figured it might be the timing causing it, so started playing around with the time, tried anything from ½ a second up to 4 seconds but still no joy in firefox with firebug turned off.

Then John went looking for the javascript errors in firefox (with firebug off) and discovered that it was throwing an error because window.console didn't exist. All of a sudden it made perfect sense! Safari has window.console.log() for writing to the console log, as does firebug. But of course firefox without firebug doesn't!

So the function was just exiting on that error. It was very weird initially to have it work perfectly as soon as the developer tools were enabled!

Automatically Deploying Website From Remote Git Repository

- 2009-05-30 02:30:40

Before I start, I'll just quickly run through where I put stuff on my server. Apache logs and config are in the ubuntu default folders: /var/log/apache2 and /etc/apache2/ respectively.

Websites:  /home/caius/vhosts/<domain name>/htdocs
Git Repos: /home/caius/git/<domain name>.git

So I have a git repo locally, ~/projects/somesite.com/, and want to deploy it to my webserver. I'll keep the git repo in ~/git/ and set it up so that when I push to the repo (over ssh) it will automatically checkout the new changes into the website's htdocs folder.

I'm assuming DNS is already setup (or I've used ghost to map it locally.) And that I've setup the virtualhost in apache pointing at /home/caius/vhosts/somesite.com/htdocs and reloaded apache so the config is in place.

Remote Machine

We create a bare git repo, then point the working tree at the docroot of our website. This means all the git stuff is kept in the somesite.git folder, but the files themselves are checked out to the website's folder. Then we setup a post-receive hook to update the worktree folder after new changes have been pushed to the repo.

$ cd git
$ mkdir somesite.git
$ cd somesite.git/
$ git init --bare
Initialized empty Git repository in /home/caius/git/somesite.git/
$ git --bare update-server-info
$ git config core.worktree /home/caius/vhosts/somesite.com/htdocs
$ git config core.bare false
$ git config receive.denycurrentbranch ignore
$ cat > hooks/post-receive
#!/bin/sh
git checkout -f
^D
$ chmod +x hooks/post-receive

Local Machine

And now on the client machine we add the remote repo as a git remote, and then push to it.

$ git remote add web ssh://myserver/home/caius/git/somesite.git
$ git push web +master:refs/heads/master
Counting objects: 3, done.
Writing objects: 100% (3/3), 229 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
To ssh://myserver/home/caius/git/somesite.git
 * [new branch]      master -> master

All Done

And now if you go to somesite.com you'll see the contents of your git repo there. (somesite.com is just an example url though, I don't actually own it!)

Helpful URLs

Find shell commands with which

- 2009-04-19 15:02:04

So I have this command in my $PATH, apachectl. Because I'm on a mac and I've installed apache2 through MacPorts, the command that gets found first is my macports install in /opt. Up until now I've always known that which apachectl will find that location, but to find any other locations of apachectl I'd usually use locate and egrep together.

Here's my original workflow, lets find the location of the apachectl being called when I don't specify a path.

Julius:~ caius$ which apachectl
/opt/local/apache2/bin/apachectl

Simple enough. Now lets figure out what other locations there's an apachectl installed at.

Julius:~ caius$ locate apachectl | egrep "\/apachectl$"
/opt/local/apache2/bin/apachectl
/opt/local/var/macports/software/apache2/2.2.11_0+darwin_9/opt/local/apache2/bin/apachectl
/usr/sbin/apachectl

Right, so now I know where else a command exists in the filesystem called apachectl, but I don't know if any of those is in my $PATH, or what order they come in when searching through my $PATH. In this (old) workflow I'd have compared them to my $PATH manually as there's so few of them.

So I noticed Ali googling for the which man page on IRC, and (quite stupidly) poked fun at him for doing so. I then swallowed my ego and actually followed the link to the man page, and boy was I glad I did. Just shows with even a fairly simple command like which, you sure don't know everything!

What I discovered was that which has a single flag you can pass it, -a. From the man page:

-a     print all matching pathnames of each argument

Right. So that locate | grep command plus manually figuring out what is in my $PATH is really hard work then. which -a should give us the same results, but a lot faster and with a lot less manual thought.

Julius:~ caius$ which -a apachectl
/opt/local/apache2/bin/apachectl
/usr/sbin/apachectl

And hey presto, yet another useful bit of bash knowledge for me, thanks to Ali not being afraid to RTFM!

Validating Data with Regular Expressions in Ruby

- 2009-04-11 12:41:48

I happened to be sent a link to the OWASP paper on Rails Security recently and started reading it. Partway in there's a section on Regular Expressions, which opens with the following line:

A common pitfall in Ruby's regular expressions is to match the string's beginning and end by ^ and $, instead of \A and \z.

Now I've never used \A and \z in my regular expressions to validate data, I've only ever used ^ and $ assuming they matched the start and end of the string. This becomes an issue with validating data in rails, because %0A (\n URL encoded) is decoded by rails before passing the string to your model to validate.

Testing our expectations

Lets say we want to validate the string as a username for our app. A username is 5 characters long and consists only of lowercase letters.

regex = /^[a-z]{5}$/

First we make sure it matches the data we want it to:

"caius".validate(regex) => true

Excellent, that validated. Now we'll try a shorter string, which we expect to fail.

"cai".validate(regex) => false

Once more, it behaves how we expected it to. The shorter string was rejected as we wanted it to be. Now, what happens if we test a string with a newline character in it? We'll make sure the data before the \n is valid, and then add some more data after the newline.

"caius\nfoo".validate(regex) => true

Uh oh! That validated and would've been saved as a username?!

Lets have a look at exactly what's happening there, the $ matches the \n character, so the regex is only matching the first 5 characters of the string, and just ignores anything after the \n. As it turns out, this is exactly what we've asked the regex to match, but we didn't want this behaviour.

Now you might be thinking, "So what? someone can have a username with a newline in it." For starters this will probably display weirdly anywhere you use their username, but more importantly it opens your application to an injection attack. Suppose they took advantage of this by setting their username to include some javascript on the page which stole your login cookie and sent it to them. You view their account in the admin section and oh no! They can login as your admin account and do what they want.

Simple example of this is just having it output an alert dialog. (This is actually the code I'll use to test an application as its not malicious, but blindingly obvious if the javascript is executed or not.)

"caius\n<script>alert('hello')</script>".validate(regex) => true

Ok, so that was the result we were expecting this time, although it's still not the outcome we wanted. Anytime their username is viewed (providing you aren't escaping the data to HTML entities) you'll see the following:

javascript alert dialog

The Solution

Having realised from our testing above that ^$ matches the beginning/end of a line in ruby not the beginning and end of a string, I hear you cry, "How do we make sure we're matching the entire string?!"

The answer is pretty simple. Just swap out ^$ for \A\z. Lets go ahead and try this with the same data as we have above, but with the modified regular expression.

new_regex = /\A[a-z]{5}\z/
"caius".validate(new_regex) => true

That's a good start, the valid string still matches.

"cai".validate(new_regex) => false

Looks like it's going well, invalid string is invalid.

"caius\nfoo".validate(new_regex) => false

Oh Excellent! It's validating this one correctly now.

And just for consistency, lets test it with a more likely attack string.

"caius\n<script>alert('hello')</script>".validate(new_regex) => false

Fantastic! We've fixed the security hole in our validation of the user's username.


If you want to actually run the code above you'll need the following at the start of the ruby script to patch the validate method into String.

class String
  def validate regex
    !self[regex].nil?
  end
end

Update: I had \Z in the new_regex rather than the \z it should've been. Thanks Ciarán.