Posts from November 2012
Introducing the Trie
So in the kickoff post of my series on data structures and algorithms I’d like to begin with a relatively simple but handy little data structure: the trie. If you want to jump ahead and look at a very simplistic implementation of a trie data structure (only the insert and dump operations have been completed), I’ve put my experimental code up on GitHub here.
A clever little play on the word retrieval (though I, and many others, insist on pronouncing it “try”… suck it etymology), a trie is a keyvalue store represented as an nary tree, except that unlike a typical keyvalue store, no one node stores the key itself. Instead, the path to the value within the tree is defined by the characters/bits/whathaveyou that define the key itself. Yeah, that’s pretty abstract, why don’t we just look at an example:
In this construction I’ve chosen the following set of keys:
 allow
 alter
 alloy
 ant
 bad
 bar
As you can see, each character in the key is used to label an edge in the tree, while the nodes store the values associated with that key (note, in this example I’ve chosen to use the keys as values as well… this entirely artificial, and a bit confusing. Just remember, those values could be absolutely anything.) ^{1} Typically these keys are strings, as depicted here, although it’s entirely possible to build a bitwise trie that can be keyed off of arbitrary strings of bits. To find the value for a key, you take each character and, starting with the root node, transition through the graph until the target node is found. Or, as pseudocode:
find(root_node, key): current_node = root_node current_key = key while (current_key.length > 0): character = current_key[0] if current_node.has_edge_for(character): current_node = current_node.get_get_for(character).endpoint else throw "ERMAGERD" return current_node.value
Strangely, a very similar algorithm can be used for both inserts and deletes.
Some Interesting Properties
The trie offers a number of interesting advantages over traditional keyvalue stores such as hash tables and binary search trees:
 As mentioned previously, they have the peculiar feature that inserts, deletes, and lookups use very similar codepaths, and thus have very similar performance characteristics. As such, in applications where these operations are performed with equal frequency, the trie can provide better overall performance than other more traditional keyvalue stores.
 Lookup performance is a factor of key length as opposed to key distribution or dataset size. As such, for lookups they often outperform both hash tables and BSTs.
 They are quite space efficient for short keys, as key prefixes are shared between edges, resulting in compression of the graph.
 They enable longestprefix matching. Given a candidate key, a trie can be used to perform a closest fit search with the same performance as an exact search.
 Preorder traversal of the graph generates an ordered list of the keys (in fact, this implementation is a form of radix sort).
 Unlike hashes, there’s no need to design a hash function, and collisions can only occur if identical keys are inserted multiple times.
Applications
Because tries are wellsuited to fuzzy matching algorithms, they often see use in spell checking implementations or other areas involving fuzzy matching against a dictionary. In addition, the trie forms the core of Radix/PATRICIA and Suffix Trees, both of which are interesting enough to warrant separate posts of their own. Stay tuned!
References

Interestingly, if you looked at this example graph, you’d be forgiven for assuming it was an illustration of a finite state machine, with the characters in the key triggering transitions to deeper levels of the graph. ↩