Why I love DATA

2013-01-08 19:09:42

In a ruby script, there's a keyword __END__ that for a long time I thought just marked anything after it as a comment. So I used to use it to store snippets and notes about the script that weren't really needed inline. Then one day I stumbled across the DATA constant, and wondered what flaming magic it was.

DATA is in fact an IO object, that you can read from (or anything else you'd do with an IO object). It contains all the content after __END__ in that ruby script file*. (It only exists when the file contains __END__, and for the first file ruby invokes though. See footnote for more details.)

How can we use this, and why indeed do I love this fickle constant? I mostly use it for quick scripts where I need to process text data, rather than piping to STDIN.

Given a list of URLs that I want to open in my web browser and look at, I could do the following for instance:

DATA.each_line.map(&:chomp).each do |url|
  `open "#{url}"`


which upon running (on a mac) would open all the URLs listed in DATA in my default web browser. (For bonus points, use Launchy for cross-platform compatibility.) Really handy & quick/simple when you've got 500+ URLs to open at once to go through. (I once had a job that required me to do this daily. Fun.)

Or given a bunch of CSV data that you just want one column for, you could reach for cut or awk in the terminal, but ruby has a really good CSV library which I trust and know how to use already. Why not just use that & DATA to pull out the field you want?

require "csv"

CSV.parse(DATA, headers: true).each do |row|
  puts row["kName"]

1,Google UK,http://google.co.uk
2,"Yahoo, UK",http://yahoo.co.uk

# >> Google UK
# >> Yahoo, UK

I find when the data I want to munge is already in my clipboard, and I can run ruby scripts directly from text editors without having to save a file, it saves having to write the data out to a file, have ruby read it back in, etc just to do something with the data. I can just write the script reading from DATA, paste the data in and run it. Which also lets me run it iteratively and build up a slightly more complex script that I don't want to keep. Then do what I need with the output and close the file without saving it.

* technically DATA is an IO handler to read __FILE__, which has been wound forward to the start of the first line after __END__ in the file. And it only exists for the first ruby file to be invoked by the interpreter.

$ cat > tmp/data.rb <<RB
p DATA.read

$ ruby tmp/data.rb

$ cat > tmp/data-require.rb <<RB
require "./tmp/data"

$ ruby tmp/data-require.rb
/Users/caius/tmp/data.rb:1:in `<top (required)>': uninitialized constant DATA (NameError)

And because it's a handle to __FILE__ though, you can rewind it and read the entire ruby script into itself…

$ ruby tmp/readself.rb 
print DATA.read

something goes here

9 Comments on Why I love DATA

  1. Say "Thank You" to Perl for that gem.

  2. ah, another perl feature 'rediscovered'. ;-)

  3. That's a really cool feature, which I think Ruby inherited from Perl http://perldoc.perl.org/perldata.html

    I find it really useful to work on output from a command e.g. put a ps listing in the end or data section and practice regexing/parsing it till you get the data you want.

  4. You can get roughly the same behavior using StringIO and heredocs. I don't think StringIO existed when DATA was introduced.

  5. i didnt know that, thats kind of awesome. Thx for this post

  6. awesome stuff, thanks for sharing!

    I could see using this for quick scripts similar to playing around in IRB or rails console

  7. You can actually use DATA as a way to lock the script so only one instance can be running at a time.

    DATA.flock(FILE::LOCK_EX | FILE::LOCK_NB) || abort "already running."

    trap("INT", "EXIT")

    puts "Running..." loop do sleep end

    END content for locking

  8. It's very cool!!


  9. Awesome. I thought it was only unparsed after end but it can definitely be useful.

About You