Caius Theory

Now with even more cowbell…

Compiling SmartOS for AMD processors

There's a few community-provided patches for SmartOS that enable KVM on AMD processors amongst other things, and given the HP Microserver has an AMD processor, that's quite useful for turning it into a better lab server. The main list of so called "eait" builds was hiccuping when I tried to download the latest, and all I could find was a 20140812T062241Z image here.

The source code for the eait builds is maintained at, and you can see the patches applied on top of the normal SmartOS master by going to

So here's how to use SmartOS to compile a more up to date AMD-friendly Smartos!

  1. Grab the latest multiarch SmartOS image (which has to be used, or the compile will fail.) The latest at the time of writing was 4aec529c-55f9-11e3-868e-a37707fcbe86, so that's what I'll use.

     imgadm import 4aec529c-55f9-11e3-868e-a37707fcbe86
  2. Spin up a zone for us to build in (the Building SmartOS on SmartOS page has extra info about this):

     echo '{
       "alias": "platform-builder",
       "brand": "joyent",
       "dataset_uuid": "4aec529c-55f9-11e3-868e-a37707fcbe86",
       "max_physical_memory": 32768,
       "quota": 0,
       "tmpfs": 8192,
       "fs_allowed": "ufs,pcfs,tmpfs",
       "maintain_resolvers": true,
       "resolvers": [
       "nics": [
           "nic_tag": "admin",
           "ip": "dhcp",
           "primary": true
       "internal_metadata": {
         "root_pw": "password",
         "admin_pw": "password"
     }' | vmadm create
  3. Login to the created zone:

     zlogin <uuid from `vmadm create` output>
  4. Update the image to the latest packages, etc:

     pkgin -y update && pkgin -y full-upgrade
  5. Install a few images we'll need to compile & package SmartOS:

     pkgin install scmgit cdrtools pbzip2
  6. Grab the source code of the fork containing the patches we want, from arekinath/smartos-live

     git clone
     cd smartos-live
  7. Optional: Edit src/Makefile.defs and change PARALLEL = -j$(MAX_JOBS) to PARALLEL = -j8 to do less at once. (Microserver only has a dual core CPU!)

  8. Copy the configure definition into the right place and start configuration:

     cp {sample.,}configure.smartos

    (You'll probably get asked to accept the java license during configuration, so keep half an eye on it)

  9. Once configure has completed (which doesn't take too long, 15 minutes or so), start building:

     gmake world && gmake live
  10. Once the build is successfully finished, time to package an iso & usb image:

    export LC_ALL=C

Hey presto, you've a freshly built AMD-friendly SmartOS build to flash to a USB key / put on your netboot server and boot your Microserver from!



Here be hax. Don't ever do these. ;-)

Reduce local variables with instance_eval

Sometimes (usually in a one-liner) I want to do some work with a value without assigning it to a variable. Chucking an #instance_eval call in there will set self to the value, which saves having to assign it to a local value. Pretty much only used by me in one-off scripts or cli commands.


start_date, end_date = ["24 Dec 2011", "23 Jan 2013"].map {|d| Date.parse(d) }
puts "#{start_date} to #{end_date} is #{(end_date - start_date).to_i} days"


puts ["24 Dec 2011", "23 Jan 2013"].map {|d| Date.parse(d) }
  .instance_eval { "#{first} to #{last} is #{(last - first).to_i} days" }

See, way less code! cough, cough

Bonus usage: Misdirection

I also dropped some instance_eval on our campfire bot at EmberAds to always blame one person, but without the code reading as such.

%w{Dom Mel Caius CBetta Baz}.sample.instance_eval do

That does not return one of the array elements as you might think it does from quickly scanning the code…

Set method-local variables in default arguments

You have a method and it takes one argument, which has a default value of nil specified. You then run into the situation where you need to know if nil was passed to the method, or if you're getting the default value of nil. You could change the default value to something you choose to be the "default value" and unlikely to be passed from elsewhere as the argument's value, and reset the parameter to nil after checking it, like this:

def output name=:default_value
  if name == :default_value
    name = "caius"
    default = true

  "name: #{name.inspect} -- default: #{default.inspect}"

output() # => "name: \"caius\" -- default: true"
output("fred") # => "name: \"fred\" -- default: nil"

That's quite a lot of code added to the method just to find out if we passed a default value or not. And if we forget to reset the value when it's :default_value then we end up leaking that into whatever the method does with that value. We also have the problem that one day the program could possibly send that "default value" we've chosen as the actual parameter, and we'd blindly change it thinking it was set as the default value, not the passed argument.

Instead we could (ab)use the power of ruby, and have ruby decide to set default = true for us when, and only when, the variable is set to the default value.

def output name=((default=true); "caius")
  "name: #{name.inspect} -- default: #{default.inspect}"

output() # => "name: \"caius\" -- default: true"
output("fred") # => "name: \"fred\" -- default: nil"

As you can see, the output is identical. Yet we have no extra code inside the method to figure out if we were given the default value or not. And as a bonus to that, we no longer have to check for a specific value being passed and presume that is actually the default, and not one passed by the program elsewhere.

I posted this one in a gist a while back (to show Avdi it looks like), and people came up with some more insane things to do with it, including returning early, raising errors or even redefining the current method, all from the argument list! I'd suggest going to read them, it's a mixture of OMG HAHA and OMFG NO WAY WHYY?!?!.

Don't do this.

Don't do the above. No really, don't do them. Unless you're writing a one-off thing. But seriously, don't do them. :-D

Some Small Refactorings in Ruby

Here's a few things I refactor as I write code down initially. Not entirely convinced it's strictly refactoring, but it's how I amend from one pattern I see in a line or three of code into a different structure that I feel achieves the same result with cleaner or more concise code.

Multiple equality comparisons

Testing the equality of an object against another is fairly simple, just do foo == "bar". However, I usually try to test against multiple objects in a slightly different way. Your first thought might be that the easiest way is just to chain a series of == with the OR (||) operator.

foo == "bar" || foo == "baz" || foo == :sed || foo == 5

I much prefer to flip it around, think of the objects I'm testing against as a collection (Array), and then ask them if they contain the object I'm checking. And for that, I use Array#include?

["bar", "baz", :sed, 5].include?(foo)

(And if you're only testing against strings, you could use %w(bar baz) as a shortcut to create the array. Here's more ruby shortcuts.)

Assigning multiple items from a nested hash to variables

Occasionally I find myself needing to be given a hash of a hash of data (most recently, an omniauth auth hash) and assign some values from it to separate variables within my code. Given the following hash, containing a nested hash:

details = {
  uid: "12345",
  info: {
    name: "Caius Durling",
    nickname: "caius",

Lets say we want to extract the name and nickname fields from details[:info] hash into their own local variables (or instance variables within a class, more likely.) We should probably handle the case of details[:info] not being a hash, and try not to read from it if that's the case - so we might end up with something like the following:

name = details[:info] && details[:info][:name]
nickname = details[:info] && details[:info][:nickname]

name # => "Caius Durling"
nickname # => "caius"

And then in the spirit of DRYing up our code, we see there's duplication in both lines in checking details[:info] exists (not actually that it's a hash, but hey ho, we rely on upstream to send us nil or a hash.) So we reduce it down using an if statement and give ourselves slightly less to type at the same time.

if (( info = details[:info] ))
  name = info[:name]
  nickname = info[:nickname]

name # => "Caius Durling"
nickname # => "caius"

Returning two values conditionally

Sometimes a method will end with a ternary, where depending on a condition it'll either return one or another value. If this conditional returns true, then the first value is returned. Otherwise it returns the second value. You could quite easily write it out as an if/else longer-form block too.

def my_method
  @blah == foo ? :foo_matches : :no_match

My brain finds picking the logic in this apart slightly harder mentally, than if I drop a return early bomb on the method. Then it reads more akin to how I'd think through the logic. Return the first value if this conditional returns true. Otherwise the method returns this second value. I think the second value being on a completely separate line helps me make this mental distinction quicker too.

So I'd write it this way:

def my_method
  return :foo_matches if @blah == foo

Returning nil or a value conditionally

Following on from the last snippet, but taking advantage of the ruby runtime a bit more, is when you're wanting to return a value if a conditional is true, or otherwise false. The easy way is to just write nil in the ternary:

def my_method
  @foo == :bar ? :foo_matches : nil

However, we know ruby returns the result of the last expression in the method. And that if a single line conditional isn't met, it returns nil from the expression. Combining that, we can rewrite the previous example into this:

def my_method
  :foo_matches if @foo == :bar

And it will still return nil in the case that @foo doesn't match :bar.

Returning a boolean

Sometimes you have a method that returns the result of a conditional, but it's written to return true/false in a conditional instead.

def my_method
  @foo == :bar ? true : false

The really easy refactor here is to just remove the ternary and leave the conditional.

def my_method
  @foo == :bar

And of course if you were returning false when the conditional evaluates to true, you can either negate the comparison (use != in that example), or negate the entire conditional result by prepending ! to the line.

Why I love DATA

In a ruby script, there's a keyword __END__ that for a long time I thought just marked anything after it as a comment. So I used to use it to store snippets and notes about the script that weren't really needed inline. Then one day I stumbled across the DATA constant, and wondered what flaming magic it was.

DATA is in fact an IO object, that you can read from (or anything else you'd do with an IO object). It contains all the content after __END__ in that ruby script file*. (It only exists when the file contains __END__, and for the first file ruby invokes though. See footnote for more details.)

How can we use this, and why indeed do I love this fickle constant? I mostly use it for quick scripts where I need to process text data, rather than piping to STDIN.

Given a list of URLs that I want to open in my web browser and look at, I could do the following for instance: do |url|
  `open "#{url}"`


which upon running (on a mac) would open all the URLs listed in DATA in my default web browser. (For bonus points, use Launchy for cross-platform compatibility.) Really handy & quick/simple when you've got 500+ URLs to open at once to go through. (I once had a job that required me to do this daily. Fun.)

Or given a bunch of CSV data that you just want one column for, you could reach for cut or awk in the terminal, but ruby has a really good CSV library which I trust and know how to use already. Why not just use that & DATA to pull out the field you want?

require "csv"

CSV.parse(DATA, headers: true).each do |row|
  puts row["kName"]

1,Google UK,
2,"Yahoo, UK",
# >> Google UK
# >> Yahoo, UK

I find when the data I want to munge is already in my clipboard, and I can run ruby scripts directly from text editors without having to save a file, it saves having to write the data out to a file, have ruby read it back in, etc just to do something with the data. I can just write the script reading from DATA, paste the data in and run it. Which also lets me run it iteratively and build up a slightly more complex script that I don't want to keep. Then do what I need with the output and close the file without saving it.

* technically DATA is an IO handler to read __FILE__, which has been wound forward to the start of the first line after __END__ in the file. And it only exists for the first ruby file to be invoked by the interpreter.

cat > tmp/data.rb <<RUBY

ruby tmp/data.rb
# => "data.rb\n"

cat > tmp/data-require.rb <<RUBY
require "./tmp/data"

ruby tmp/data-require.rb
# => /Users/caius/tmp/data.rb:1:in `<top (required)>': uninitialized constant DATA (NameError)

And because it's a file handle pointing at the current file, you can rewind it and read the entire ruby script into itself…

$ ruby tmp/readself.rb 

something goes here

Geolocation in nginx

Sometimes you need to have a rough idea of where your website visitor is located. There's many ways to geolocate them, but if you just want to go to country level then MaxMind have free geo databases available to help you. When we needed to do this quickly on-the-fly at EmberAds, we came up with the trifle gem, which supports ipv4 and ipv6 lookups.

Recently I was searching for something else to do with nginx and ran across a mailing list thread about using the maxmind database with nginx's HTTP Geo module and do the lookup directly in nginx itself. Finally got a chance to sit down and work out the logistics of doing this. I've done this on an ubuntu 12.04 box, with the expected config file layouts that come with ubuntu.

Run the following on your server (as someone with write access to nginx config files):

# Generate the text file for nginx to import
perl <(curl -s \
< <(zip=$(tempfile) && \
curl -so $zip \
&& unzip -p $zip) > /etc/nginx/nginx_ip_country.txt

# Tell nginx to work out the IP country and store in variable
echo 'geo $IP_COUNTRY {
  default --;
  include /etc/nginx/nginx_ip_country.txt;
}' > /etc/nginx/conf.d/ip_country.conf

Now go find the http block for the vhost you want to have the header passed to, and assuming it's passenger, add the following:

# http {
  # server_name;
  passenger_set_cgi_param HTTP_X_IP_COUNTRY $IP_COUNTRY;
# }

(If you don't use passenger, look at the docs for proxy_pass_header or fastcgi_pass_header to see which you'll require for your setup.)

Reload nginx, and behold, request.env["HTTP_X_IP_COUNTRY"] (assuming a rack app running under ruby) will be a two letter country code, or "--".

Unfortunately this is IPv4 only currently, there's a thread on the nginx mailing list from November 2012 saying IPv6 support should be coming on the v1.3 branch of nginx, but with no known ETA. So currently for IPv6 support, take a look at EmberAds' trifle gem instead.

Use Readline With Default Ruby on OS X

OS X Lion comes with ruby 1.8.7-p249 installed, although it's compiled against libedit rather than libreadline. Whilst libedit is a mostly-compatible replacement for libreadline, I find there's a couple of settings I'm used to that don't work in libedit. (Like history-beginning-search-backward.)

Luckily you can grab the source of ruby and compile just the readline extension, and move it into the right place for it to just work. Here's what's been working for me:

# Install readline using homebrew
brew install readline

# Download the ruby source and check out 1.8.7-p249
mkdir ~/tmp && cd ~/tmp
git clone git://
cd ruby
git checkout v1_8_7_249
cd ext/readline
ruby extconf.rb --with-readline-dir=$(brew --prefix readline) --disable-libedit

Now you should have readline.bundle in the current directory, and it should be compiled against your homebrew-installed readline library, rather than libedit that comes with the system. We can quickly double-check that by using otool to check what the binary is linked against.

$ otool -L readline.bundle
    /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/libruby.1.dylib (compatibility version 1.8.0, current version 1.8.7)
    /usr/local/Cellar/readline/6.2.2/lib/libreadline.6.2.dylib (compatibility version 6.0.0, current version 6.2.0)
    /usr/lib/libncurses.5.4.dylib (compatibility version 5.4.0, current version 5.4.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 159.1.0)

And in the output you should see a line listing "libreadline", and no lines listing "libedit". Which that shows, we've compiled it properly then. Now the bundle is built we need to move it into the right place so it's loaded when ruby is invoked.

# Back up the original bundle, just in cases
sudo mv "$RL_PATH/readline.bundle" "$RL_PATH/readline.bundle.libedit"
sudo mv readline.bundle "$RL_PATH/readline.bundle"

And that's it. You've got a proper compiled-against-readline installed ruby 1.8.7-p249 on 10.7 now.

One gotcha I ran into was needing to pass the same arguments to rvm when installing any other version of 1.8.7 on the same machine. Simple enough, just need to remember to do it though.

CC=gcc-4.2 rvm install 1.8.7-p357 -C --with-readline-dir=$(brew --prefix readline) --disable-libedit

Install GCC-4.2.1 (Apple build 5666.3) with Xcode 4.2

As of Xcode 4.2 Apple have stopped bundling GCC with it, shipping only the (mostly) compatible llvm-gcc binary instead. The suggested fix is to install GCC using the osx-gcc-installer project. However, I wanted to build and install it from source, which apple provides at

You should already have installed Xcode 4.2 from the app store, then basically the following steps are to grab the tarball from the 4.1 developer tools source, unpack & compile it, then install it into the right places.

Update 2016-07-03: I'd suggest just using homebrew to install this these days:

brew install homebrew/dupes/apple-gcc42


# Grab and unpack the tarball
mkdir ~/tmp && cd ~/tmp
curl -O
tar zxf gcc-5666.3.tar.gz
cd gcc-5666.3

# Setup some stuff it requires
mkdir -p build/obj build/dst build/sym
# And then build it. You should go make a cup of tea or five whilst this runs.
gnumake install RC_OS=macos RC_ARCHS='i386 x86_64' TARGETS='i386 x86_64' \
  SRCROOT=`pwd` OBJROOT=`pwd`/build/obj DSTROOT=`pwd`/build/dst \

# And finally install it
sudo ditto build/dst /

And now you should have gcc-4.2 in your $PATH, available to build all the things that llvm-gcc fails to compile.

#to_param and keyword slugs

Imagine you've got a blogging app and it's currently generating URL paths like posts/10 for individual posts. You decide the path should contain the post title (in some form) to make your URLs friendlier when someone reads them. I know I certainly prefer to read vs (That's a fun blog post if you're into (ab)using ruby occasionally!)

Now you know all about how to change the URL path that rails generates—just define to_param in your app. Something simple that generates a slug consisting of hyphens and lowercase alphanumerical characters. For example:

# 70-abusing-ruby-1-9-json-for-fun
def to_param
  "#{id}-#{title.gsub(/\W/, "-").squeeze("-")}".downcase

NB: You might want to go the route of storing the slug against the post record in the database and thus generating it before saving the record. In which case the rest of this post is sort of moot and you just need to search on that column. If not, then read on!

Now we're generating a nice human-readable URL we need to change the way we find the post in the controller's show action. Up until now it's been a simple @post = Post.find(params[:id]) to grab the record out the database. Problem now is params[:id] is "70-abusing-ruby-1-9-json-for-fun", rather than just "70". A quick check in the String#to_i docs reveals it "Returns the result of interpreting leading characters in str as an integer base base (between 2 and 36)." Basically it extracts the first number it comes across and returns it.

Knowing that we can just lean on it to extract the id before using find to look for the post: @post = Post.find(params[:id].to_i). Fantastic! We've got nice human readable paths on our blog posts and they can be found in the database. All finished… or are we?

There's still a rather embarassing bug in our code where we're not explicitly checking the slug in the URL against the slug of the Post we've extracted from the database. If we visited /posts/70-ruby-19-sucks-and-python-rules-4eva it would load the blog post and render it without batting an eyelid. This has caused rather a few embarrassing situations for some high profile media outlets who don't (or didn't) check their URLs and just output the content. Luckily there's a simple way for us to check this.

All we want to do is render the content if the id param matches the slug of the post exactly, and return a 404 page if it doesn't. We already know the id param (params[:id]) and have pulled the Post object out of the database and stored it in an instance variable (@post). The @post knows how to generate it's own slug, using #to_param.

So we end up with something like the following in our posts controller, which does all the above and correctly returns a 404 if someone enters an invalid slug (even if it starts with a valid post id):

def show
  @post = Post.find(params[:id].to_i)
  render_404 && return unless params[:id] == @post.to_param

def render_404
  render :file => Rails.root + "public/404.html", :status => :not_found

And going to an invalid path like /posts/70-ruby-19-sucks-and-python-rules-4eva just renders the default rails 404 page with a 404 HTTP status. (If you want the id to appear at the end of the path, alter to_param accordingly and do something like params[:id].match(/\d+$/) to extract the Post's id to search on.)

Hey presto, we've implemented human readable slugs that are tamper-proof (without storing them in the database.)

(And bonus points if in fact you spotted I used my blog as an example, but that it isn't a rails app. (Nor contains the blog post ID in the pretty URL.) It's actually powered by Habari at the time of posting!

Bad Recruiters - Rhys Evans at Devonshire

This is a linked-in invitation I received from Rhys, and my reply.

Update 2011-02-10: As much as recruiters can be scummy twats, Rhys appears to at least care somewhat about his relationship with potential clients/contacts and has responded in the comments. Normally my policy with recruiters is a two strike one, first email gets a polite "No thanks, go away.", second gets a mini-rant to bugger off and stop contacting me. Rhys hadn't technically contacted me before, but the unsolicited xmas email showed up when I searched my mailbox (which had annoyed me back when I received it.) And asking amongst my peers around at the time showed he was fairly disliked. (As you can see in some of the comments left below as well.) Since then, including a comment left below, a few people I trust have noted he's really not that bad as recruiters go, and the fact he's left a comment acknowledging perhaps his approach is a little misguided is enough for me to see he does still care about trying to be better than the rest of the recruitment crowd.

I still stand by my initial reply to him, and all other recruiters who don't understand "No." however.

On 02/09/11 5:30 AM, Rhys Evans wrote:

Hi Caius,

Good afternoon, I hope all is well.

I've noticed we are connected to a number of the same people within the Rails space on LinkedIn. I've tried you on 07960 268100 to no avail so I'd like to add you to my network and make contact.

Hi Rhys,

I've already got you blocked on twitter, so we've obviously run across each other in the past. You also appear to have sent me a Merry Xmas email from Devonshire, with no previous contact initiated by me. (I seem to remember a lot of my friends got those emails as well and we eventually worked out you'd scraped Github for them.)

Now it would appear you're being more intrusive and hunting folk out on linked in, ignoring the fact that they are employed and have specifically set linked in to say they aren't looking for new jobs currently. From asking around you harass a few of my friends, to the point of ringing one up recently to tell them you knew they'd changed jobs and where they were now working. If you're going to spend time doing that much research then why not have the decency to not be a completely mannerless cunt and leave them alone when they request you to.

It would also appear you've just blanket-spammed me and a few people in my peer group through linked in with the same request, again a pretty dumb thing to do. It's as if you recruiters think we never talk to each other, and don't realise how much you lot being a bunch of pestering spammy bastards taints developers against ever dealing with a recruiter.

So no, I don't think I do want to accept your invitation to connect. And please never phone, email or contact me via any other means. I'm happily employed and if I ever need the services of a recruiter I'll find someone who actually possesses an ounce of politeness about approaching (potential) candidates.


Abusing Ruby 1.9 & JSON for fun

Ever since I found out about the new hash syntax you can use in ruby 1.9, and how similar that syntax is to JSON, I've been waiting for someone to realise you can just abuse eval() for parsing (some) JSON now.

For example, lets say we have the following ruby hash, which could be generated by a RESTful api:

thing = {
    :person => {
        :name => "caius"

If we pull in the JSON gem and dump that out as a string, we get the following:

jsonstr = thing.to_json
# => '{"person":{"name":"caius"}}'

That's… not quite what we wanted. It's not going to turn back into valid ruby as it is. Luckily javascript will parse objects without requiring the attributes to be wrapped in quotes, eg: {some: "attribute"}. We could build a JSON emitter that does it properly, or we could just run it through a regular expression instead. (Lets also add a space after the colon to aid readability.)

jsonstr.gsub!(/"([^"]+)": /, '\1: ')
# => '{person: {name: "caius"}}'

Okay, so now we've turned a ruby hash into a JSON hash, that can still be parsed by the browser. Here's a screenshot to prove that:

Valid JSON 'thing'

As you can see, it parses that into a valid JS object, complete with person and then (nested) name attributes. If we wanted to, thing["person"]["name"] or would access the nested value "caius" just fine.

Now then, we've proved that is successfully parsed into javascript objects by the browser, generated from a ruby hash. No great shakes there, that's fairly simple and has worked for ages. Now for my next trick, I'm going to turn that string of JSON back into a ruby hash, all without going anywhere near the JSON gem.

Some of you might have guessed what I'm about to do and have started hoping you've guessed wrongly — just for the record I don't condone doing this except for fun and games. The JSON gem is there for a reason ;) With that little disclaimer out the way, here we go!

thing2 = eval(jsonstr)
# => {:person=>{:name=>"caius"}}
thing2 == thing
# => true

Oh snap! We just turned javascript objects back into valid ruby objects, in one simple method call. And we'd be able to access the "caius" value by calling thing2[:person][:name], or creating OpenStructs from the hashes and calling Which is uncannily like the JS!

Updated 2011-02-07: Jens Ayton pointed out unquoted keys aren't strictly valid JSON, which is correct. Amended to say they're parsed as javascript objects instead, with no mention of it being valid JSON.