Ruby performance: Hash's #has_key? vs. #[] (square brackets)
February 15, 2018
During my time last year developing performance improvements for Rambling Trie, I stumbled into something quite interesting that happens with Ruby’s Hash
class.
A commonly used method from the Hash
interface is #has_key?
which tells you if the Hash
in question contains a particular key. Rambling Trie’s underlying data structure is an n-ary tree backed by a Hash
where each key is the letter corresponding to a child and each value is the Node
that corresponds to that letter. As you might imagine, #has_key?
is a common operation called throughout the gem implementation.
To my surprise, while running some Ruby benchmarks I noticed that accessing the key with #[]
and verifying if it was nil
instead of calling #has_key?
lowered the time it took to complete certain operations. This was suspicious to say the least.
I loaded up benchmark-ips and wrote this quick script:
require 'benchmark/ips'
Benchmark.ips do |bm|
hash = { 'thing' => 'gniht' }
bm.report 'has_key?' do
hash.has_key? 'thing'
end
bm.report '[]' do
!!hash['thing']
end
bm.compare!
end
Which gave me this:
...
Comparison:
[]: 6461337.5 i/s
has_key?: 5140776.3 i/s - same-ish: difference falls within error
I know, I know. It says same-ish: difference falls within error
, but notice how #[]
had approximately 1.3 million more iterations per second. The relatively small performance hit becomes noticeable when you’re building a trie from a large dictionary.
I’ll take it. From now on, I’ll use #[]
instead of #has_key?
for relatively low level operations where every millisecond counts.
I ran this benchmark earlier today with Ruby 2.5.0 and the gap between these two operations has gotten bigger!! It is now literally about 1.5 times slower to use #has_key?
:
...
Comparison:
[]: 6056391.2 i/s
has_key?: 3956594.2 i/s - 1.53x slower
Notes