Ruby performance: Hash's #has_key? vs. #[] (square brackets)

February 15, 2018

During my time last year developing performance improvements for Rambling Trie, I stumbled into something quite interesting that happens with Ruby’s Hash class.

A commonly used method from the Hash interface is #has_key? which tells you if the Hash in question contains a particular key. Rambling Trie’s underlying data structure is an n-ary tree backed by a Hash where each key is the letter corresponding to a child and each value is the Node that corresponds to that letter. As you might imagine, #has_key? is a common operation called throughout the gem implementation.

To my surprise, while running some Ruby benchmarks I noticed that accessing the key with #[] and verifying if it was nil instead of calling #has_key? lowered the time it took to complete certain operations. This was suspicious to say the least.

I loaded up benchmark-ips and wrote this quick script:

require 'benchmark/ips'

Benchmark.ips do |bm|
  hash = { 'thing' => 'gniht' }

  bm.report 'has_key?' do
    hash.has_key? 'thing'
  end

  bm.report '[]' do
    !!hash['thing']
  end

  bm.compare!
end

Which gave me this:

...
Comparison:
                  []:  6461337.5 i/s
            has_key?:  5140776.3 i/s - same-ish: difference falls within error

I know, I know. It says same-ish: difference falls within error, but notice how #[] had approximately 1.3 million more iterations per second. The relatively small performance hit becomes noticeable when you’re building a trie from a large dictionary.

I’ll take it. From now on, I’ll use #[] instead of #has_key? for relatively low level operations where every millisecond counts.


I ran this benchmark earlier today with Ruby 2.5.0 and the gap between these two operations has gotten bigger!! It is now literally about 1.5 times slower to use #has_key?:

...
Comparison:
                  []:  6056391.2 i/s
            has_key?:  3956594.2 i/s - 1.53x  slower

Notes

  • I haven’t dug into it yet, but I’m sure there’s something going on here and here that explains it.