Caius Theory

Now with even more cowbell…

evil.rb

Here be hax. Don't ever do these. ;-)

Reduce local variables with instance_eval

Sometimes (usually in a one-liner) I want to do some work with a value without assigning it to a variable. Chucking an #instance_eval call in there will set self to the value, which saves having to assign it to a local value. Pretty much only used by me in one-off scripts or cli commands.

Good

start_date, end_date = ["24 Dec 2011", "23 Jan 2013"].map {|d| Date.parse(d) }
puts "#{start_date} to #{end_date} is #{(end_date - start_date).to_i} days"

Bad

puts ["24 Dec 2011", "23 Jan 2013"].map {|d| Date.parse(d) }
  .instance_eval { "#{first} to #{last} is #{(last - first).to_i} days" }

See, way less code! cough, cough

Bonus usage: Misdirection

I also dropped some instance_eval on our campfire bot at EmberAds to always blame one person, but without the code reading as such.

%w{Dom Mel Caius CBetta Baz}.sample.instance_eval do
  "(4V5A8F5T=&$`".unpack("u")[0]
end

That does not return one of the array elements as you might think it does from quickly scanning the code…

Set method-local variables in default arguments

You have a method and it takes one argument, which has a default value of nil specified. You then run into the situation where you need to know if nil was passed to the method, or if you're getting the default value of nil. You could change the default value to something you choose to be the "default value" and unlikely to be passed from elsewhere as the argument's value, and reset the parameter to nil after checking it, like this:

def output name=:default_value
  if name == :default_value
    name = "caius"
    default = true
  end

  "name: #{name.inspect} -- default: #{default.inspect}"
end

output() # => "name: \"caius\" -- default: true"
output("fred") # => "name: \"fred\" -- default: nil"

That's quite a lot of code added to the method just to find out if we passed a default value or not. And if we forget to reset the value when it's :default_value then we end up leaking that into whatever the method does with that value. We also have the problem that one day the program could possibly send that "default value" we've chosen as the actual parameter, and we'd blindly change it thinking it was set as the default value, not the passed argument.

Instead we could (ab)use the power of ruby, and have ruby decide to set default = true for us when, and only when, the variable is set to the default value.

def output name=((default=true); "caius")
  "name: #{name.inspect} -- default: #{default.inspect}"
end

output() # => "name: \"caius\" -- default: true"
output("fred") # => "name: \"fred\" -- default: nil"

As you can see, the output is identical. Yet we have no extra code inside the method to figure out if we were given the default value or not. And as a bonus to that, we no longer have to check for a specific value being passed and presume that is actually the default, and not one passed by the program elsewhere.

I posted this one in a gist a while back (to show Avdi it looks like), and people came up with some more insane things to do with it, including returning early, raising errors or even redefining the current method, all from the argument list! I'd suggest going to read them, it's a mixture of OMG HAHA and OMFG NO WAY WHYY?!?!.

Don't do this.

Don't do the above. No really, don't do them. Unless you're writing a one-off thing. But seriously, don't do them. :-D

Geolocation in nginx

Sometimes you need to have a rough idea of where your website visitor is located. There's many ways to geolocate them, but if you just want to go to country level then MaxMind have free geo databases available to help you. When we needed to do this quickly on-the-fly at EmberAds, we came up with the trifle gem, which supports ipv4 and ipv6 lookups.

Recently I was searching for something else to do with nginx and ran across a mailing list thread about using the maxmind database with nginx's HTTP Geo module and do the lookup directly in nginx itself. Finally got a chance to sit down and work out the logistics of doing this. I've done this on an ubuntu 12.04 box, with the expected config file layouts that come with ubuntu.

Run the following on your server (as someone with write access to nginx config files):

# Generate the text file for nginx to import
perl <(curl -s https://raw.github.com/nginx/nginx/master/contrib/geo2nginx.pl) \
< <(zip=$(tempfile) && \
curl -so $zip http://geolite.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip \
&& unzip -p $zip) > /etc/nginx/nginx_ip_country.txt

# Tell nginx to work out the IP country and store in variable
echo 'geo $IP_COUNTRY {
  default --;
  include /etc/nginx/nginx_ip_country.txt;
}' > /etc/nginx/conf.d/ip_country.conf

Now go find the http block for the vhost you want to have the header passed to, and assuming it's passenger, add the following:

# http {
  # server_name freddy.com;
  passenger_set_cgi_param HTTP_X_IP_COUNTRY $IP_COUNTRY;
# }

(If you don't use passenger, look at the docs for proxy_pass_header or fastcgi_pass_header to see which you'll require for your setup.)

Reload nginx, and behold, request.env["HTTP_X_IP_COUNTRY"] (assuming a rack app running under ruby) will be a two letter country code, or "--".

Unfortunately this is IPv4 only currently, there's a thread on the nginx mailing list from November 2012 saying IPv6 support should be coming on the v1.3 branch of nginx, but with no known ETA. So currently for IPv6 support, take a look at EmberAds' trifle gem instead.

Install capybara-webkit gem on Ubuntu

Dear future Caius searching for this issue,

The apt package you need to install to use the capybara-webkit rubygem on ubuntu (tested on 10.04 and 11.10) is libqt4-dev. That is, to gem install capybara-webkit, you need to run aptitude install libqt4-dev.

Yours helpfully,
Past Caius

Install GCC-4.2.1 (Apple build 5666.3) with Xcode 4.2

As of Xcode 4.2 Apple have stopped bundling GCC with it, shipping only the (mostly) compatible llvm-gcc binary instead. The suggested fix is to install GCC using the osx-gcc-installer project. However, I wanted to build and install it from source, which apple provides at http://opensource.apple.com/.

You should already have installed Xcode 4.2 from the app store, then basically the following steps are to grab the tarball from the 4.1 developer tools source, unpack & compile it, then install it into the right places.

Update 2016-07-03: I'd suggest just using homebrew to install this these days:

brew install homebrew/dupes/apple-gcc42

Instructions

# Grab and unpack the tarball
mkdir ~/tmp && cd ~/tmp
curl -O http://opensource.apple.com/tarballs/gcc/gcc-5666.3.tar.gz
tar zxf gcc-5666.3.tar.gz
cd gcc-5666.3

# Setup some stuff it requires
mkdir -p build/obj build/dst build/sym
# And then build it. You should go make a cup of tea or five whilst this runs.
gnumake install RC_OS=macos RC_ARCHS='i386 x86_64' TARGETS='i386 x86_64' \
  SRCROOT=`pwd` OBJROOT=`pwd`/build/obj DSTROOT=`pwd`/build/dst \
  SYMROOT=`pwd`/build/sym

# And finally install it
sudo ditto build/dst /

And now you should have gcc-4.2 in your $PATH, available to build all the things that llvm-gcc fails to compile.

TweetSavr

I've had a dream for a while. A simple webapp that takes the last tweet in a conversation and outputs that conversation in chronological order on a page you can link to forevermore. Occasionally I'll google to see if anything new's turned up, but they all seem to do far more, require the start and end tweets or are covered in ads.

So one friday evening I just built it. It's called TweetSavr. It's very simple—to the point the error page is just a standard 500 server error page currently. It fetches, caches and displays a conversation, given just the last tweet in said conversation.

KISS extends to the interface as well, I'm quite a fan of URL hacking to use webapps, so TweetSavr works on that basis as well. The homepage sort of has some help telling you how to use it, but you basically take the (old-twitter) URL of the last tweet and paste it after tweetsavr.com in the address bar. Eg, http://tweetsavr.com/http://twitter.com/ElizabethN/status/19766711653765120. It'll then redirect you through to the actual page for that conversation. You can also put just the status id on the end of the URL, http://tweetsavr.com/19766711653765120 and hey presto, it loads.

The caching layer is moderately rudimentary, after fetching a tweet that isn't in the cache it writes out a hash of data for that tweet into a yaml file. And when looking up a tweet it checks to see if that file exists, reading it in from disk if it is. Bonus side-effect is it builds up a corpus of tweets as yaml files on disk.

It lives on the internet at http://tweetsavr.com/ and the source is on github at http://github.com/caius/tweetsavr

Side note: isn't it wonderful what we can create given just a few hours, a server somewhere in the cloud, and an idea? Never ceases to amaze me what can be built in just a short amount of time, even the dead simple things.

#to_param and keyword slugs

Imagine you've got a blogging app and it's currently generating URL paths like posts/10 for individual posts. You decide the path should contain the post title (in some form) to make your URLs friendlier when someone reads them. I know I certainly prefer to read http://caiustheory.com/abusing-ruby-19-and-json-for-fun vs http://caiustheory.com/?id=70. (That's a fun blog post if you're into (ab)using ruby occasionally!)

Now you know all about how to change the URL path that rails generates—just define to_param in your app. Something simple that generates a slug consisting of hyphens and lowercase alphanumerical characters. For example:

# 70-abusing-ruby-1-9-json-for-fun
def to_param
  "#{id}-#{title.gsub(/\W/, "-").squeeze("-")}".downcase
end

NB: You might want to go the route of storing the slug against the post record in the database and thus generating it before saving the record. In which case the rest of this post is sort of moot and you just need to search on that column. If not, then read on!

Now we're generating a nice human-readable URL we need to change the way we find the post in the controller's show action. Up until now it's been a simple @post = Post.find(params[:id]) to grab the record out the database. Problem now is params[:id] is "70-abusing-ruby-1-9-json-for-fun", rather than just "70". A quick check in the String#to_i docs reveals it "Returns the result of interpreting leading characters in str as an integer base base (between 2 and 36)." Basically it extracts the first number it comes across and returns it.

Knowing that we can just lean on it to extract the id before using find to look for the post: @post = Post.find(params[:id].to_i). Fantastic! We've got nice human readable paths on our blog posts and they can be found in the database. All finished… or are we?

There's still a rather embarassing bug in our code where we're not explicitly checking the slug in the URL against the slug of the Post we've extracted from the database. If we visited /posts/70-ruby-19-sucks-and-python-rules-4eva it would load the blog post and render it without batting an eyelid. This has caused rather a few embarrassing situations for some high profile media outlets who don't (or didn't) check their URLs and just output the content. Luckily there's a simple way for us to check this.

All we want to do is render the content if the id param matches the slug of the post exactly, and return a 404 page if it doesn't. We already know the id param (params[:id]) and have pulled the Post object out of the database and stored it in an instance variable (@post). The @post knows how to generate it's own slug, using #to_param.

So we end up with something like the following in our posts controller, which does all the above and correctly returns a 404 if someone enters an invalid slug (even if it starts with a valid post id):

def show
  @post = Post.find(params[:id].to_i)
  render_404 && return unless params[:id] == @post.to_param
end

def render_404
  render :file => Rails.root + "public/404.html", :status => :not_found
end

And going to an invalid path like /posts/70-ruby-19-sucks-and-python-rules-4eva just renders the default rails 404 page with a 404 HTTP status. (If you want the id to appear at the end of the path, alter to_param accordingly and do something like params[:id].match(/\d+$/) to extract the Post's id to search on.)

Hey presto, we've implemented human readable slugs that are tamper-proof (without storing them in the database.)

(And bonus points if in fact you spotted I used my blog as an example, but that it isn't a rails app. (Nor contains the blog post ID in the pretty URL.) It's actually powered by Habari at the time of posting!

+[NSObject load] in MacRuby

If you've not heard of it, MacRuby is an implementation of Ruby 1.9 directly on top of Mac OS X core technologies such as the Objective-C runtime and garbage collector, the LLVM compiler infrastructure and the Foundation and ICU frameworks. Basically means you write in Ruby using Objective-C frameworks, and vice versa. It's pretty damn cool to be honest!

What is +[NSObject load]?

From the documentation:

Invoked whenever a class or category is added to the Objective-C runtime; implement this method to perform class-specific behavior upon loading.

This means when your class is loaded, and implements the load method, you get a load message sent to your class. Which means you can start doing stuff as soon as your class is loaded by the runtime.

The main place I've seen it used (and used it myself) is in SIMBL plugins. A SIMBL plugin is an NSBundle that contains code which is loaded (injected) into a running application shortly after said application is launched. It lets you extend (or "fix") cocoa applications with additional features. So you have this bundle of code, that gets loaded into a running application some point after it starts, and you want to run some code as the bundle is loaded - usually to kick off doing whatever you want to do in the plugin. This is where load becomes useful.

Here's a quick implementation that just logs to the console:

    @implementation MainController
    
    + (void) load
    {
        NSlog(@"MainController#load called");
    }

Now where does MacRuby come into this?

Well I came across a need to do the same in ruby, have some code triggered when the class is loaded into the runtime. Tried implementing Class.load but to no avail. Then remembered MacRuby is just ruby! And I can call any code from within my ruby class definition.

For continuity I still call it Class.load, but then call it as soon as I've defined it in the class. Eg:

class MainController
  def self.load
    NSLog "MainController#load called"
  end

  self.load
end

Of course, I'm not sure when the Objective-C method is called, it's probably after the entire class has been defined rather than as soon as load has been loaded into the runtime. So you might want to move the self.load call to just before the closing end.

Potty Training YAML

Ran into a problem today where I have a class with a few attributes on it, but I only want a certain three of those attributes to appear in the YAML dump of a class instance.

Diving straight into a code example–lets say we have a Contact class, and we only want to dump the name, email and website attributes.

require "yaml"

class Contact
  attr_accessor :name, :email, :website, :telephone

  # helper method to make setting up easy
  def initialize(params={})
    params.each do |key, value|
      meffod = "#{key.to_s}="
      send(meffod, value) if respond_to?(meffod)
    end
  end
end

# And create an instance for us to play with
caius = Contact.new(
  :name => "Caius",
  :email => "dev@caius.name",
  :website => "http://caius.name/",
  :telephone => "12345"
)

As we'd expect when dumping this, all instance variables get dumped out:

print caius.to_yaml
# >> --- !ruby/object:Contact 
# >> email: dev@caius.name
# >> name: Caius
# >> telephone: "12345"
# >> website: http://caius.name/

Initially I tried to override to_yaml and unset the instance variables I didn't want showing up, but that just made them show up empty. After digging around a bit more, I happened across the Type Families page in the yaml4r docs, which right at the bottom mentions to_yaml_properties.

Turns out to_yaml_properties returns an array of instance variable names (as strings) that should be dumped out as part of the object. A quick method definition later, and we're only dumping the variables we want. (See my Ruby Shortcuts post if you don't know what %w() does)

class Contact
  def to_yaml_properties
    %w(@name @email @website)
  end
end

And now we dump the class, expecting only the three attributes to be outputted:

print caius.to_yaml
# >> --- !ruby/object:Contact 
# >> name: Caius
# >> email: dev@caius.name
# >> website: http://caius.name/

Success!

Ruby Shortcuts

There's a few useful shorthand ways to create certain objects in Ruby, a couple of obvious ones are [] to create an Array and {} to create a Hash (Or block/Proc). There's some not so obvious ones too, for creating strings, regexes and executing shell commands.

With all of the examples I've used {} as the delimiter characters, but you can use a variety of characters. Personally I tend to use {} unless the string contains them, in which case I'll use // or @@. My only exception appears to be %w, for which I tend to use ().

Strings

% and %Q are the same as using double quotes, including string interpolation. Really useful when you want to create a string that contains double quotes, but without the hassle of escaping them.

%{}                 # => ""
%Q{}                # => ""

%{caius}            # => "caius"
%{caius #{5}}       # => "caius 5"
%{some "foo" thing} # => "some \"foo\" thing"

%q is equivalent to using single quotes. Behaves exactly the same, no string interpolation.

%q{}           # => ''
%q{caius}      # => "caius"
%q{caius #{5}} # => "caius \#{5}"

### Arrays

%w is the equivalent of using String#split. It takes a string and splits it on whitespace. With the added bonus of being able to escape whitespace too. %W allows string interpolation.

%w(foo bar sed)  # => ["foo", "bar", "sed"]
%w(foo\ bar sed) # => ["foo bar", "sed"]
%W(foo #{5} bar) # => ["foo", "5", "bar"]

Regexes

%r is just like using // to create a regexp object. Comes in handy when you're writing a regex containing / as you don't have to continually escape it.

%r{foo|bar} # => /foo|bar/
%r{foo/bar} # => /foo\/bar/

Symbols

%s creates a symbol, just like writing :foo manually. It takes care of escaping the symbol, but unlike :"" it doesn't allow string interpolation however.

%s{foo}      # => :foo
%s{foo/bar}  # => :"foo/bar"
:"foo-#{5}"  # => :"foo-5"
%s{foo-#{5}} # => :"foo-\#{5}"

Shelling out

%x is the same as backticks (``), executes the command in a shell and returns the output as a string. And just like backticks it supports string interpolation.

`echo hi`     # => "hi\n"
%x{echo hi}   # => "hi\n"
%x{echo #{5}} # => "5\n"

Read standard input using Objective-C

On a couple of occasions now I've wanted to read from STDIN into an Objective-C command line tool, and both times I've had to hunt quite a bit to find the answer because nothing shows up in google for the search terms I used. "Objective-c read from stdin" and "objc read stdin" both turn up results ranging from using NSInputStream to dropping some C++ in there.

The answer is quite simple really, just use NSFileHandle. More specifically +[NSFileHandle fileHandleWithStandardInput]. You can then read all data currently in STDIN, monitor it for new data and anything else you can do with a normal NSFileHandle.

And here's some example code, reads all data from STDIN and stores it into an NSString:

NSFileHandle *input = [NSFileHandle fileHandleWithStandardInput];
NSData *inputData = [NSData dataWithData:[input readDataToEndOfFile]];
NSString *inputString = [[NSString alloc]
  initWithData:inputData encoding:NSUTF8StringEncoding];

I'm using this in GarbageCollected apps, memory management without GC is left as an exercise to the user.