Caius Theory

Now with even more cowbell…

Why I love DATA

In a ruby script, there's a keyword __END__ that for a long time I thought just marked anything after it as a comment. So I used to use it to store snippets and notes about the script that weren't really needed inline. Then one day I stumbled across the DATA constant, and wondered what flaming magic it was.

DATA is in fact an IO object, that you can read from (or anything else you'd do with an IO object). It contains all the content after __END__ in that ruby script file*. (It only exists when the file contains __END__, and for the first file ruby invokes though. See footnote for more details.)

How can we use this, and why indeed do I love this fickle constant? I mostly use it for quick scripts where I need to process text data, rather than piping to STDIN.

Given a list of URLs that I want to open in my web browser and look at, I could do the following for instance:

DATA.each_line.map(&:chomp).each do |url|
  `open "#{url}"`
end

__END__
http://google.com/
http://yahoo.com/

which upon running (on a mac) would open all the URLs listed in DATA in my default web browser. (For bonus points, use Launchy for cross-platform compatibility.) Really handy & quick/simple when you've got 500+ URLs to open at once to go through. (I once had a job that required me to do this daily. Fun.)

Or given a bunch of CSV data that you just want one column for, you could reach for cut or awk in the terminal, but ruby has a really good CSV library which I trust and know how to use already. Why not just use that & DATA to pull out the field you want?

require "csv"

CSV.parse(DATA, headers: true).each do |row|
  puts row["kName"]
end

__END__
kId,kName,kURL
1,Google UK,http://google.co.uk
2,"Yahoo, UK",http://yahoo.co.uk
# >> Google UK
# >> Yahoo, UK

I find when the data I want to munge is already in my clipboard, and I can run ruby scripts directly from text editors without having to save a file, it saves having to write the data out to a file, have ruby read it back in, etc just to do something with the data. I can just write the script reading from DATA, paste the data in and run it. Which also lets me run it iteratively and build up a slightly more complex script that I don't want to keep. Then do what I need with the output and close the file without saving it.

* technically DATA is an IO handler to read __FILE__, which has been wound forward to the start of the first line after __END__ in the file. And it only exists for the first ruby file to be invoked by the interpreter.

cat > tmp/data.rb <<RUBY
p DATA.read
__END__
data.rb
RUBY

ruby tmp/data.rb
# => "data.rb\n"

cat > tmp/data-require.rb <<RUBY
require "./tmp/data"
RUBY

ruby tmp/data-require.rb
# => /Users/caius/tmp/data.rb:1:in `<top (required)>': uninitialized constant DATA (NameError)

And because it's a file handle pointing at the current file, you can rewind it and read the entire ruby script into itself…

$ ruby tmp/readself.rb 
DATA.rewind
print DATA.read

__END__
something goes here

Some Small Refactorings in Ruby

Here's a few things I refactor as I write code down initially. Not entirely convinced it's strictly refactoring, but it's how I amend from one pattern I see in a line or three of code into a different structure that I feel achieves the same result with cleaner or more concise code.

Multiple equality comparisons

Testing the equality of an object against another is fairly simple, just do foo == "bar". However, I usually try to test against multiple objects in a slightly different way. Your first thought might be that the easiest way is just to chain a series of == with the OR (||) operator.

foo == "bar" || foo == "baz" || foo == :sed || foo == 5

I much prefer to flip it around, think of the objects I'm testing against as a collection (Array), and then ask them if they contain the object I'm checking. And for that, I use Array#include?

["bar", "baz", :sed, 5].include?(foo)

(And if you're only testing against strings, you could use %w(bar baz) as a shortcut to create the array. Here's more ruby shortcuts.)

Assigning multiple items from a nested hash to variables

Occasionally I find myself needing to be given a hash of a hash of data (most recently, an omniauth auth hash) and assign some values from it to separate variables within my code. Given the following hash, containing a nested hash:

details = {
  uid: "12345",
  info: {
    name: "Caius Durling",
    nickname: "caius",
  },
}

Lets say we want to extract the name and nickname fields from details[:info] hash into their own local variables (or instance variables within a class, more likely.) We should probably handle the case of details[:info] not being a hash, and try not to read from it if that's the case - so we might end up with something like the following:

name = details[:info] && details[:info][:name]
nickname = details[:info] && details[:info][:nickname]

name # => "Caius Durling"
nickname # => "caius"

And then in the spirit of DRYing up our code, we see there's duplication in both lines in checking details[:info] exists (not actually that it's a hash, but hey ho, we rely on upstream to send us nil or a hash.) So we reduce it down using an if statement and give ourselves slightly less to type at the same time.

if (( info = details[:info] ))
  name = info[:name]
  nickname = info[:nickname]
end

name # => "Caius Durling"
nickname # => "caius"

Returning two values conditionally

Sometimes a method will end with a ternary, where depending on a condition it'll either return one or another value. If this conditional returns true, then the first value is returned. Otherwise it returns the second value. You could quite easily write it out as an if/else longer-form block too.

def my_method
  @blah == foo ? :foo_matches : :no_match
end

My brain finds picking the logic in this apart slightly harder mentally, than if I drop a return early bomb on the method. Then it reads more akin to how I'd think through the logic. Return the first value if this conditional returns true. Otherwise the method returns this second value. I think the second value being on a completely separate line helps me make this mental distinction quicker too.

So I'd write it this way:

def my_method
  return :foo_matches if @blah == foo
  :no_match
end

Returning nil or a value conditionally

Following on from the last snippet, but taking advantage of the ruby runtime a bit more, is when you're wanting to return a value if a conditional is true, or otherwise false. The easy way is to just write nil in the ternary:

def my_method
  @foo == :bar ? :foo_matches : nil
end

However, we know ruby returns the result of the last expression in the method. And that if a single line conditional isn't met, it returns nil from the expression. Combining that, we can rewrite the previous example into this:

def my_method
  :foo_matches if @foo == :bar
end

And it will still return nil in the case that @foo doesn't match :bar.

Returning a boolean

Sometimes you have a method that returns the result of a conditional, but it's written to return true/false in a conditional instead.

def my_method
  @foo == :bar ? true : false
end

The really easy refactor here is to just remove the ternary and leave the conditional.

def my_method
  @foo == :bar
end

And of course if you were returning false when the conditional evaluates to true, you can either negate the comparison (use != in that example), or negate the entire conditional result by prepending ! to the line.

evil.rb

Here be hax. Don't ever do these. ;-)

Reduce local variables with instance_eval

Sometimes (usually in a one-liner) I want to do some work with a value without assigning it to a variable. Chucking an #instance_eval call in there will set self to the value, which saves having to assign it to a local value. Pretty much only used by me in one-off scripts or cli commands.

Good

start_date, end_date = ["24 Dec 2011", "23 Jan 2013"].map {|d| Date.parse(d) }
puts "#{start_date} to #{end_date} is #{(end_date - start_date).to_i} days"

Bad

puts ["24 Dec 2011", "23 Jan 2013"].map {|d| Date.parse(d) }
  .instance_eval { "#{first} to #{last} is #{(last - first).to_i} days" }

See, way less code! cough, cough

Bonus usage: Misdirection

I also dropped some instance_eval on our campfire bot at EmberAds to always blame one person, but without the code reading as such.

%w{Dom Mel Caius CBetta Baz}.sample.instance_eval do
  "(4V5A8F5T=&$`".unpack("u")[0]
end

That does not return one of the array elements as you might think it does from quickly scanning the code…

Set method-local variables in default arguments

You have a method and it takes one argument, which has a default value of nil specified. You then run into the situation where you need to know if nil was passed to the method, or if you're getting the default value of nil. You could change the default value to something you choose to be the "default value" and unlikely to be passed from elsewhere as the argument's value, and reset the parameter to nil after checking it, like this:

def output name=:default_value
  if name == :default_value
    name = "caius"
    default = true
  end

  "name: #{name.inspect} -- default: #{default.inspect}"
end

output() # => "name: \"caius\" -- default: true"
output("fred") # => "name: \"fred\" -- default: nil"

That's quite a lot of code added to the method just to find out if we passed a default value or not. And if we forget to reset the value when it's :default_value then we end up leaking that into whatever the method does with that value. We also have the problem that one day the program could possibly send that "default value" we've chosen as the actual parameter, and we'd blindly change it thinking it was set as the default value, not the passed argument.

Instead we could (ab)use the power of ruby, and have ruby decide to set default = true for us when, and only when, the variable is set to the default value.

def output name=((default=true); "caius")
  "name: #{name.inspect} -- default: #{default.inspect}"
end

output() # => "name: \"caius\" -- default: true"
output("fred") # => "name: \"fred\" -- default: nil"

As you can see, the output is identical. Yet we have no extra code inside the method to figure out if we were given the default value or not. And as a bonus to that, we no longer have to check for a specific value being passed and presume that is actually the default, and not one passed by the program elsewhere.

I posted this one in a gist a while back (to show Avdi it looks like), and people came up with some more insane things to do with it, including returning early, raising errors or even redefining the current method, all from the argument list! I'd suggest going to read them, it's a mixture of OMG HAHA and OMFG NO WAY WHYY?!?!.

Don't do this.

Don't do the above. No really, don't do them. Unless you're writing a one-off thing. But seriously, don't do them. :-D