Rambling Trie 1.0.0 released!

January 23, 2017

It’s been a while since I last wrote anything about Rambling Trie (or anything at all really), but I wanted to take the time to talk about all the changes and performance improvements that I’ve made to the gem lately and that culminated in the release of version 1.0.0.

The changes worth highlighting are:

NEW: Ability to dump and load tries from disk

It takes a relatively long time to load all the words from a large dictionary into a new trie in memory. This becomes especially annoying when you have to do this full process every time you restart your application, even though you know that the trie is going to remain constant over time.

To sort this out, you can now use the Rambling::Trie.load and Rambling::Trie.dump methods! Like this:

require 'rambling-trie'

trie = Rambling::Trie.create '/usr/share/dict/web2'

# Dump into '/tmp/web2.marshal'
Rambling::Trie.dump trie, '/tmp/web2.marshal'

And then when you’re booting your application you can load the dumped trie with something like this:

require 'rambling-trie'

# Load from '/tmp/web2.marshal'
trie = Rambling::Trie.load '/tmp/web2.marshal'

What makes me really excited about this feature is that it saves time (by about 30%) and memory (by about 60%). Who knew?!

HINT: You can save up to 75% more time and up to 80% memory if you use a compressed trie (although individual operations may be a bit slower).

NEW: Ability to configure various options for the gem

As Rambling Trie’s feature set grows, the ability to easily extend its functionality without much hassle becomes more important. Now that you can dump and load tries from disk, wouldn’t it be nice to be able to support custom file formats?

Well, now you can! Together with the ability to configure file readers, a custom compressor and a custom root node builder. You can initialize the gem with a configuration block like this:

require 'rambling-trie'

Rambling::Trie.config do |config|
  config.compressor = MyCompressor.new
  config.root_builder = lambda { MyNode.new }

  config.readers.add :html, MyHtmlReader.new
  config.readers.default = config.readers[:html]

  config.serializers.add :json, MyJsonSerializer.new
  config.serializers.default = config.serializers[:yml]
end

NEW: Ability to get words within a given phrase

Useful to easily find out if a string has any valid words:

trie.words_within 'tktkawesome' # => [ "awe", "awesome", ... ]

NEW: Ability to compare tries and nodes

Useful to test if two tries or two nodes have the same dictionary.

if trie == other_trie
  # be funny
end

if node == other_node
  # be funky
end

NEW: Cleaner more extensible code and a unified API entry point

Lots of refactoring went into separating the code between raw and compressed tries, which are now represented by the Rambling::Trie::RawNode and Rambling::Trie::CompressedNode classes, and into building a unified API entry point that avoids duplication between both tries, which resulted in the Rambling::Trie::Container.

In addition, the presence of the Rambling::Trie::Comparable, Rambling::Trie::Compressable, Rambling::Trie::Enumerable, Rambling::Trie::Forwardable, Rambling::Trie::Inspectable, Rambling::Trie::Stringifyable modules encapsulates common behavior between nodes.

NEW: Less CPU and memory

By relentlessly measuring the performance of the gem under different settings, I’ve learned a lot more about benchmarking and profiling in general, especially call tree profiles and memory profiles (but I’ll talk more about that in another post).

Improvements that seemed marginal like assigning true to the #terminal attribute of Rambling::Trie:Node when the the node is terminal and leaving it nil when the node is non-terminal proved to have a significant impact in both memory footprint as well as computation time.

NEW: Added a CHANGELOG and a CONTRIBUTING guide!

Because tracking changes and making it easy for people to contribute is important.


I’ll write a bit more about the kind of surprises I found by measuring with benchmarks and profiles soon. Next on Rambling Trie’s roadmap: make it even faster by dropping down to The Metal™.


Notes

  • See Supported formats section in the README for available formats for dump and load
  • Thanks to @darkogj for identifying and suggesting both the dump and load and the words within features!