Caius Theory

Now with even more cowbell…

Add to iCloud Reading List programmatically

One piece of a larger puzzle I'm trying to solve currently, was how to add a given URL to my Apple "Reading List" that is stored in iCloud and synced across all my OS X and iOS devices. More specifically, I wanted to add URLs to the list from my mac running Mavericks (10.9). I had a quick look at the Cocoa APIs and couldn't see anything in OS X to do this. (iOS has an API to do it from Cocoa-land it seems though.)

I figured Safari.app was the key to getting this done on OS X, given it has the ability itself to add the current page to the reading list, either via a keyboard command, a menu item, or a button in the address bar. One quick mental leap later, and I was wondering if the engineers at Apple had been nice enough to expose that via Applescript for me to take advantage of.

One quick stop in "Script Editor.app" later, and I had the Applescript dictionary open for Safari.app. Lo and behold, there is rather handily an Applescript command called "add reading list item", which does exactly what I want. It has a few different options you can call it with, depending on whether you want Safari to go populate the title & preview text, or if you want to specify it yourself at save-time.

As I want to be able to call this from multiple runtimes, I've chosen to save it as an executable, which leans on osascript to run the actual Applescript. And here it is:

#!/usr/bin/env osascript

on run argv
    if (count of argv) > 0
        tell app "Safari" to add reading list item (item 1 of argv as text)
    end if
end run

Save it as whatever you want (eg. add_to_reading_list), make it executable (chmod +x add_to_reading_list), and then run it with the URL you want saving as the first argument.

$ add_to_reading_list "http://caius.name/"
$ add_to_reading_list "http://google.com/"
# … etc …

(Adding support for specifying preview text and title is left as an exercise for the reader!)

Have fun reading later!

evil.rb

Here be hax. Don't ever do these. ;-)

Reduce local variables with instance_eval

Sometimes (usually in a one-liner) I want to do some work with a value without assigning it to a variable. Chucking an #instance_eval call in there will set self to the value, which saves having to assign it to a local value. Pretty much only used by me in one-off scripts or cli commands.

Good

start_date, end_date = ["24 Dec 2011", "23 Jan 2013"].map {|d| Date.parse(d) }
puts "#{start_date} to #{end_date} is #{(end_date - start_date).to_i} days"

Bad

puts ["24 Dec 2011", "23 Jan 2013"].map {|d| Date.parse(d) }
  .instance_eval { "#{first} to #{last} is #{(last - first).to_i} days" }

See, way less code! cough, cough

Bonus usage: Misdirection

I also dropped some instance_eval on our campfire bot at EmberAds to always blame one person, but without the code reading as such.

%w{Dom Mel Caius CBetta Baz}.sample.instance_eval do
  "(4V5A8F5T=&$`".unpack("u")[0]
end

That does not return one of the array elements as you might think it does from quickly scanning the code…

Set method-local variables in default arguments

You have a method and it takes one argument, which has a default value of nil specified. You then run into the situation where you need to know if nil was passed to the method, or if you're getting the default value of nil. You could change the default value to something you choose to be the "default value" and unlikely to be passed from elsewhere as the argument's value, and reset the parameter to nil after checking it, like this:

def output name=:default_value
  if name == :default_value
    name = "caius"
    default = true
  end

  "name: #{name.inspect} -- default: #{default.inspect}"
end

output() # => "name: \"caius\" -- default: true"
output("fred") # => "name: \"fred\" -- default: nil"

That's quite a lot of code added to the method just to find out if we passed a default value or not. And if we forget to reset the value when it's :default_value then we end up leaking that into whatever the method does with that value. We also have the problem that one day the program could possibly send that "default value" we've chosen as the actual parameter, and we'd blindly change it thinking it was set as the default value, not the passed argument.

Instead we could (ab)use the power of ruby, and have ruby decide to set default = true for us when, and only when, the variable is set to the default value.

def output name=((default=true); "caius")
  "name: #{name.inspect} -- default: #{default.inspect}"
end

output() # => "name: \"caius\" -- default: true"
output("fred") # => "name: \"fred\" -- default: nil"

As you can see, the output is identical. Yet we have no extra code inside the method to figure out if we were given the default value or not. And as a bonus to that, we no longer have to check for a specific value being passed and presume that is actually the default, and not one passed by the program elsewhere.

I posted this one in a gist a while back (to show Avdi it looks like), and people came up with some more insane things to do with it, including returning early, raising errors or even redefining the current method, all from the argument list! I'd suggest going to read them, it's a mixture of OMG HAHA and OMFG NO WAY WHYY?!?!.

Don't do this.

Don't do the above. No really, don't do them. Unless you're writing a one-off thing. But seriously, don't do them. :-D

Some Small Refactorings in Ruby

Here's a few things I refactor as I write code down initially. Not entirely convinced it's strictly refactoring, but it's how I amend from one pattern I see in a line or three of code into a different structure that I feel achieves the same result with cleaner or more concise code.

Multiple equality comparisons

Testing the equality of an object against another is fairly simple, just do foo == "bar". However, I usually try to test against multiple objects in a slightly different way. Your first thought might be that the easiest way is just to chain a series of == with the OR (||) operator.

foo == "bar" || foo == "baz" || foo == :sed || foo == 5

I much prefer to flip it around, think of the objects I'm testing against as a collection (Array), and then ask them if they contain the object I'm checking. And for that, I use Array#include?

["bar", "baz", :sed, 5].include?(foo)

(And if you're only testing against strings, you could use %w(bar baz) as a shortcut to create the array. Here's more ruby shortcuts.)

Assigning multiple items from a nested hash to variables

Occasionally I find myself needing to be given a hash of a hash of data (most recently, an omniauth auth hash) and assign some values from it to separate variables within my code. Given the following hash, containing a nested hash:

details = {
  uid: "12345",
  info: {
    name: "Caius Durling",
    nickname: "caius",
  },
}

Lets say we want to extract the name and nickname fields from details[:info] hash into their own local variables (or instance variables within a class, more likely.) We should probably handle the case of details[:info] not being a hash, and try not to read from it if that's the case - so we might end up with something like the following:

name = details[:info] && details[:info][:name]
nickname = details[:info] && details[:info][:nickname]

name # => "Caius Durling"
nickname # => "caius"

And then in the spirit of DRYing up our code, we see there's duplication in both lines in checking details[:info] exists (not actually that it's a hash, but hey ho, we rely on upstream to send us nil or a hash.) So we reduce it down using an if statement and give ourselves slightly less to type at the same time.

if (( info = details[:info] ))
  name = info[:name]
  nickname = info[:nickname]
end

name # => "Caius Durling"
nickname # => "caius"

Returning two values conditionally

Sometimes a method will end with a ternary, where depending on a condition it'll either return one or another value. If this conditional returns true, then the first value is returned. Otherwise it returns the second value. You could quite easily write it out as an if/else longer-form block too.

def my_method
  @blah == foo ? :foo_matches : :no_match
end

My brain finds picking the logic in this apart slightly harder mentally, than if I drop a return early bomb on the method. Then it reads more akin to how I'd think through the logic. Return the first value if this conditional returns true. Otherwise the method returns this second value. I think the second value being on a completely separate line helps me make this mental distinction quicker too.

So I'd write it this way:

def my_method
  return :foo_matches if @blah == foo
  :no_match
end

Returning nil or a value conditionally

Following on from the last snippet, but taking advantage of the ruby runtime a bit more, is when you're wanting to return a value if a conditional is true, or otherwise false. The easy way is to just write nil in the ternary:

def my_method
  @foo == :bar ? :foo_matches : nil
end

However, we know ruby returns the result of the last expression in the method. And that if a single line conditional isn't met, it returns nil from the expression. Combining that, we can rewrite the previous example into this:

def my_method
  :foo_matches if @foo == :bar
end

And it will still return nil in the case that @foo doesn't match :bar.

Returning a boolean

Sometimes you have a method that returns the result of a conditional, but it's written to return true/false in a conditional instead.

def my_method
  @foo == :bar ? true : false
end

The really easy refactor here is to just remove the ternary and leave the conditional.

def my_method
  @foo == :bar
end

And of course if you were returning false when the conditional evaluates to true, you can either negate the comparison (use != in that example), or negate the entire conditional result by prepending ! to the line.

Why I love DATA

In a ruby script, there's a keyword __END__ that for a long time I thought just marked anything after it as a comment. So I used to use it to store snippets and notes about the script that weren't really needed inline. Then one day I stumbled across the DATA constant, and wondered what flaming magic it was.

DATA is in fact an IO object, that you can read from (or anything else you'd do with an IO object). It contains all the content after __END__ in that ruby script file*. (It only exists when the file contains __END__, and for the first file ruby invokes though. See footnote for more details.)

How can we use this, and why indeed do I love this fickle constant? I mostly use it for quick scripts where I need to process text data, rather than piping to STDIN.

Given a list of URLs that I want to open in my web browser and look at, I could do the following for instance:

DATA.each_line.map(&:chomp).each do |url|
  `open "#{url}"`
end

__END__
http://google.com/
http://yahoo.com/

which upon running (on a mac) would open all the URLs listed in DATA in my default web browser. (For bonus points, use Launchy for cross-platform compatibility.) Really handy & quick/simple when you've got 500+ URLs to open at once to go through. (I once had a job that required me to do this daily. Fun.)

Or given a bunch of CSV data that you just want one column for, you could reach for cut or awk in the terminal, but ruby has a really good CSV library which I trust and know how to use already. Why not just use that & DATA to pull out the field you want?

require "csv"

CSV.parse(DATA, headers: true).each do |row|
  puts row["kName"]
end

__END__
kId,kName,kURL
1,Google UK,http://google.co.uk
2,"Yahoo, UK",http://yahoo.co.uk
# >> Google UK
# >> Yahoo, UK

I find when the data I want to munge is already in my clipboard, and I can run ruby scripts directly from text editors without having to save a file, it saves having to write the data out to a file, have ruby read it back in, etc just to do something with the data. I can just write the script reading from DATA, paste the data in and run it. Which also lets me run it iteratively and build up a slightly more complex script that I don't want to keep. Then do what I need with the output and close the file without saving it.

* technically DATA is an IO handler to read __FILE__, which has been wound forward to the start of the first line after __END__ in the file. And it only exists for the first ruby file to be invoked by the interpreter.

cat > tmp/data.rb <<RUBY
p DATA.read
__END__
data.rb
RUBY

ruby tmp/data.rb
# => "data.rb\n"

cat > tmp/data-require.rb <<RUBY
require "./tmp/data"
RUBY

ruby tmp/data-require.rb
# => /Users/caius/tmp/data.rb:1:in `<top (required)>': uninitialized constant DATA (NameError)

And because it's a file handle pointing at the current file, you can rewind it and read the entire ruby script into itself…

$ ruby tmp/readself.rb 
DATA.rewind
print DATA.read

__END__
something goes here

Geolocation in nginx

Sometimes you need to have a rough idea of where your website visitor is located. There's many ways to geolocate them, but if you just want to go to country level then MaxMind have free geo databases available to help you. When we needed to do this quickly on-the-fly at EmberAds, we came up with the trifle gem, which supports ipv4 and ipv6 lookups.

Recently I was searching for something else to do with nginx and ran across a mailing list thread about using the maxmind database with nginx's HTTP Geo module and do the lookup directly in nginx itself. Finally got a chance to sit down and work out the logistics of doing this. I've done this on an ubuntu 12.04 box, with the expected config file layouts that come with ubuntu.

Run the following on your server (as someone with write access to nginx config files):

# Generate the text file for nginx to import
perl <(curl -s https://raw.github.com/nginx/nginx/master/contrib/geo2nginx.pl) \
< <(zip=$(tempfile) && \
curl -so $zip http://geolite.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip \
&& unzip -p $zip) > /etc/nginx/nginx_ip_country.txt

# Tell nginx to work out the IP country and store in variable
echo 'geo $IP_COUNTRY {
  default --;
  include /etc/nginx/nginx_ip_country.txt;
}' > /etc/nginx/conf.d/ip_country.conf

Now go find the http block for the vhost you want to have the header passed to, and assuming it's passenger, add the following:

# http {
  # server_name freddy.com;
  passenger_set_cgi_param HTTP_X_IP_COUNTRY $IP_COUNTRY;
# }

(If you don't use passenger, look at the docs for proxy_pass_header or fastcgi_pass_header to see which you'll require for your setup.)

Reload nginx, and behold, request.env["HTTP_X_IP_COUNTRY"] (assuming a rack app running under ruby) will be a two letter country code, or "--".

Unfortunately this is IPv4 only currently, there's a thread on the nginx mailing list from November 2012 saying IPv6 support should be coming on the v1.3 branch of nginx, but with no known ETA. So currently for IPv6 support, take a look at EmberAds' trifle gem instead.

Install capybara-webkit gem on Ubuntu

Dear future Caius searching for this issue,

The apt package you need to install to use the capybara-webkit rubygem on ubuntu (tested on 10.04 and 11.10) is libqt4-dev. That is, to gem install capybara-webkit, you need to run aptitude install libqt4-dev.

Yours helpfully,
Past Caius

Use Readline With Default Ruby on OS X

OS X Lion comes with ruby 1.8.7-p249 installed, although it's compiled against libedit rather than libreadline. Whilst libedit is a mostly-compatible replacement for libreadline, I find there's a couple of settings I'm used to that don't work in libedit. (Like history-beginning-search-backward.)

Luckily you can grab the source of ruby and compile just the readline extension, and move it into the right place for it to just work. Here's what's been working for me:

# Install readline using homebrew
brew install readline

# Download the ruby source and check out 1.8.7-p249
mkdir ~/tmp && cd ~/tmp
git clone git://github.com/ruby/ruby
cd ruby
git checkout v1_8_7_249
cd ext/readline
ruby extconf.rb --with-readline-dir=$(brew --prefix readline) --disable-libedit
make

Now you should have readline.bundle in the current directory, and it should be compiled against your homebrew-installed readline library, rather than libedit that comes with the system. We can quickly double-check that by using otool to check what the binary is linked against.

$ otool -L readline.bundle
readline.bundle:
    /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/libruby.1.dylib (compatibility version 1.8.0, current version 1.8.7)
    /usr/local/Cellar/readline/6.2.2/lib/libreadline.6.2.dylib (compatibility version 6.0.0, current version 6.2.0)
    /usr/lib/libncurses.5.4.dylib (compatibility version 5.4.0, current version 5.4.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 159.1.0)

And in the output you should see a line listing "libreadline", and no lines listing "libedit". Which that shows, we've compiled it properly then. Now the bundle is built we need to move it into the right place so it's loaded when ruby is invoked.

RL_PATH="/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/universal-darwin11.0"
# Back up the original bundle, just in cases
sudo mv "$RL_PATH/readline.bundle" "$RL_PATH/readline.bundle.libedit"
sudo mv readline.bundle "$RL_PATH/readline.bundle"

And that's it. You've got a proper compiled-against-readline installed ruby 1.8.7-p249 on 10.7 now.

One gotcha I ran into was needing to pass the same arguments to rvm when installing any other version of 1.8.7 on the same machine. Simple enough, just need to remember to do it though.

CC=gcc-4.2 rvm install 1.8.7-p357 -C --with-readline-dir=$(brew --prefix readline) --disable-libedit

Defining Ruby Superclasses On The Fly

Any rubyist that's defined a class should understand the following class definition:

class Foo < Object
end

It creates a new Constant (Foo) that is a subclass of Object. Pretty straightforward. But what you might not know is that ruby executes each line it reads in as it reads them. So we could do the following to show that:

class Foo < Object
  puts "we just defined object!"
end

And get the following output when we run that file:

# >> we just defined object!

So.. we know ruby is executing things on the fly whilst defining classes for us. How can we use this for profit and shenanigans?! (Err, use this for vanquishing evil and creating readable code I mean. Honest.)

How about if we've got a couple of opinionated people who like to think they've got the biggest ego in the interpreter? The last one to be loaded likes to have any new people ushered into the interpreter to be a subclass of themselves. Lets abuse a global for storing it in, and use a method to choose that on the fly when creating a new class.

def current_awkward_bugger
  $awkward_bugger
end

class Simon
end
$awkward_bugger = Simon

class Fred < current_awkward_bugger()
end
Fred.superclass # => Simon

class Harold
end
$awkward_bugger = Harold

class John < current_awkward_bugger()
end
John.superclass # => Harold

Fred.superclass # => Simon

Okay, so that looks a bit different to normally defined classes. We create Simon, assign him to the awkward bugger global then create Fred, who subclasses the return value of the current_awkward_bugger method which happens to be Simon currently. Then Harold muscles his way into the interpreter and decides he's going to be the current awkward bugger, so poor John gets to subclass Harold even though he's defined in the same way as Fred. (And as you can see on the last line, Fred's superclass is unchanged even though we changed the awkward_bugger global.)

On a somewhat related note there's a lovely method called const_missing that gets invoked when you call a Constant in ruby that isn't defined. (Much like method_missing if you're familiar with that.) Means you can do even more shenanigans with non-existent superclasses for classes you're defining.

class Simon
end
class Harold
end

class Object
  def self.const_missing(konstant)
    [Simon, Harold].shuffle.first
  end
end

class Fred < ArrogantBastard
end
Fred.superclass # => Simon

class Albert < ArrogantBastard
end
Albert.superclass # => Harold

So here we're choosing from Simon and Harold on the fly each time a missing constant is referenced (in this case the aptly named ArrogantBastard constant.) If you run this code yourself you'll see the superclasses change on each run according to what your computer picks that time.

Experimental Procrastination

A while ago I read a blog post that Elizabeth N wrote, on the value of writing self-serving code. Ever since I've been moderately aware of when I've written self-serving code, usually either at hackdays, or just little projects where I'm either experimenting with something or just bashing out a new idea.

In fact, I even wrote about one of my recent "self-serving" projects I bashed out in an evening, TweetSavr. It has no tests, was written moderately quickly and not refactored immensely (here's hoping the git history backs me up on that! I certainly didn't knowingly majorly refactor it at any point at least.) But it "scratched an itch" and solved the problem I had, and it works for the limited use case I need it to, so it's a success as far as I'm concerned.

More recently a friend remarked to me in a private conversation that everyone needs to procrastinate occasionally, to save them going "stir crazy". Whilst I agree with what she said, everyone needs to switch gears and do something that perhaps you shouldn't, or that won't directly contribute to completing your current task, I couldn't help but draw a link between procrastinating and writing self-serving code.

Now I'm a programmer, it's what I did for a hobby through school, it's what I leapt into a career doing when I was offered the chance and even when I've had a particularly exhausting week, it's something I'll eventually turn back to. But I realised that often when I procrastinate, I do so by writing self-serving code. My creative output or process if you will is to create things digitally on the computer, be that a web app, hacky script to let me do something I'm not supposed to be able to with someone else's application, just dicking around or exploring whatever tidbit of interesting info/behaviour in a language or library someone's just shared on IRC.

Aside: I've often joked (semi-seriously) that if/when I have enough cash in the bank to not have to actually have a "day job", I'll just spend all day building the random ideas that get tossed out on IRC instead. Quite often the smaller things I code up anyway already of an evening and they end up in my gists.

Now it's unhealthy and counter-productive to want to program 24/7, at least in the long term. (Doing the odd 24 hour hackday event here and there can mean winning fun prizes to play with however.) And sometimes all that you need to do to solve a problem you've been butting your head against for the last couple of hours is to get off the damn computer. (I usually find taking a shower makes my subconscious reveal the answer it's been quietly computing and hey presto, I know how to solve the problem properly!) At other times it just requires changing gears and flexing a different part of your brain muscle. Like say, writing self-serving code. And procrastinating by doing so.

I'm not entirely sure what the point of this thought process is, or if there can really be a point to it, but it really intrigued me drawing a link between procrastinating and writing self-serving code. I can imagine other creatively minded people might do much the same thing, an artist just sketching for the sake of sketching, or a writer taking a couple of hours off from her next novel to write a short story for her blog. (That last one is actually something a friend's done, go read her short stories, they're quite good. Start from the bottom though.)

And of course sometimes you just need to vegetate and read facebook (or twitter), but constructive procrastination does serve a real purpose I think, and can be quite useful as well.

Install GCC-4.2.1 (Apple build 5666.3) with Xcode 4.2

As of Xcode 4.2 Apple have stopped bundling GCC with it, shipping only the (mostly) compatible llvm-gcc binary instead. The suggested fix is to install GCC using the osx-gcc-installer project. However, I wanted to build and install it from source, which apple provides at http://opensource.apple.com/.

You should already have installed Xcode 4.2 from the app store, then basically the following steps are to grab the tarball from the 4.1 developer tools source, unpack & compile it, then install it into the right places.

Update 2016-07-03: I'd suggest just using homebrew to install this these days:

brew install homebrew/dupes/apple-gcc42

Instructions

# Grab and unpack the tarball
mkdir ~/tmp && cd ~/tmp
curl -O http://opensource.apple.com/tarballs/gcc/gcc-5666.3.tar.gz
tar zxf gcc-5666.3.tar.gz
cd gcc-5666.3

# Setup some stuff it requires
mkdir -p build/obj build/dst build/sym
# And then build it. You should go make a cup of tea or five whilst this runs.
gnumake install RC_OS=macos RC_ARCHS='i386 x86_64' TARGETS='i386 x86_64' \
  SRCROOT=`pwd` OBJROOT=`pwd`/build/obj DSTROOT=`pwd`/build/dst \
  SYMROOT=`pwd`/build/sym

# And finally install it
sudo ditto build/dst /

And now you should have gcc-4.2 in your $PATH, available to build all the things that llvm-gcc fails to compile.