Know Your Tools - Don't Shoot Yourself in the Foot

Wed, Jun 6, 2012

Imagine this - Your build is taking forever. You put in a lot of effort and restructure it. Things improve a lot, but it is far from where you would like it to be. You try hard to identify things that could improve the build time, but fail. You blame the platforms you use, you blame Ruby and you even blame the relative position of the Moon to Venus. Slowly you learn to accept the slow build as a part of your life. Months later a new developer joins the team and proves that there is a bug in the build scripts that causes certain tasks to be run twice.

This happened to us recently, and the subsequent debugging/postmortem revealed that there were things about our tools that we simply did not know.

A bit of context

Ours is a Java code base, but we use Rake for our build scripts. We have been using Ruby 1.8.7 and Rake 0.8.7. We never got around to upgrading the Ruby versions due to a number of reasons including one of our gems breaking on 1.9.3 and more importantly our laziness. Well, laziness will cause you damage in the end and that is what happened to us.

What is the issue ?

“That step is not running multiple times, is it?”

While debugging our build, our new developer discovered that certain tasks that take up 10 minutes to complete were being executed twice. We soon started trying to figure out where in the chain of Rake task calls did we end up repeating a task. There were none. After some time, we would learn that we did not know enough about Rake and Ruby 1.8.7

Rake’s redefining behavior

Rake treats redefining a Task as akin to appending actions to the same Task. To see this in action, add the following to a Rakefile:

task :foo do
  puts "foo " * 10
end

task :foo do
  puts "bar " * 10
end

If you run the foo task, you would get:

$rake foo
foo foo foo foo foo foo foo foo foo foo
bar bar bar bar bar bar bar bar bar bar

This is far removed from behaviors of Ruby and Make, the tools that inspired Rake. For clarity’s sake, this is how Ruby treats redefinitions:

#!/usr/bin/env ruby
#example.rb

def example
  puts "example " * 10
end

def example
  puts "foobar " * 10
end

example

When run:

$ruby example.rb
foobar foobar foobar foobar foobar foobar foobar foobar foobar foobar

Lets take a look at Make’s behavior. This is how Make treats redefinitions:

# Example Makefile

foo:
    @echo Foo target
foo:
    @echo Redefined foo target

When run:

$make foo
Makefile:6: warning: overriding commands for target `foo'
Makefile:4: warning: ignoring old commands for target `foo'
Redefined foo target

While this behavior of Rake may seem weird, there are some advantages to it like the ability to add custom behavior to third party Rake tasks. I asked Jim Weirich, the creator of Rake and he was of the opinion that it was a useful choice. We were not aware of this particular behavior of Rake.

Under normal situations, this behavior should not cause us trouble, as it is unlikely that we would redefine our own Rake tasks. But this combined with the next issue led to some of our build tasks running twice.

Ruby 1.8.7’s multiple require issue

It turns out that in Ruby 1.8.7, whenever a file is required multiple times, Ruby decides whether to reload it based on the path of the file provided. So if you have the same file being required twice, but using different paths, the file will be loaded twice. If there are Rake tasks defined in that file, the effect of actions in the Rake task will be multiplied.

Again, this can be best demonstrated with an example. Consider the following scenario. There is a paent directory with Rakefile, zoo.rb and a directory subdir with the file bar.rb inside it.

zoo.rb defines a Rake task print_zoo:

# zoo.rb

desc "prints zoo"
task :print_zoo do
  puts "zoo " * 10
end

bar.rb in the directory subdir does nothing but require zoo in the parent directory.

#subdir/bar.rb

require File.dirname(__FILE__) + "/../zoo"

The Rakefile requires both zoo.rb and bar.rb and defines a task to show the behavior.

require File.dirname(__FILE__) + "/zoo"
require File.dirname(__FILE__) + "/subdir/bar"

desc "show weirdness"
task :test => :print_zoo

Now if we run the test task:

$rake test
zoo zoo zoo zoo zoo zoo zoo zoo zoo zoo
zoo zoo zoo zoo zoo zoo zoo zoo zoo zoo

Lets take a step back and think about that. If we load Ruby files multiple times using require we can easily end up repeating a build step that would take 30 minutes to complete.

I must note that this issue does not happen on Ruby 1.9.3, where files are being loaded only once. All the code I showed here is available here, if you want to take a closer look.

Learning

This is not meant to be a criticism of Rake or Ruby. Although Rake’s behavior in this scenario looks weird, there are good reasons why Jim chose that behavior. And we should have moved to Ruby 1.9.3 anyway. Certainly the learning is that we need to have a better understanding of the tools we use daily.