The Trie data structure

Posted by : at

Categories : trie   data-structure   tree


The thumbnail was generated with this site. Go check it out!

The name

The Trie takes its name from its main operation, the reTRIEval, which is more a bad pun than an actual operation. It’s also known as “Prefix Tree” because of the way it is structured. A Trie is a tree built to store a large amount of strings as well as integer numbers. Actually, it can store any sequence of symbols, given that the alphabet of those symbols is finite and known. More specifically, a trie is d-ary tree, where d is the size of the alphabet of the symbols used.

Each node of a Trie can contain at most d sons, one for each symbol. Each node represents a certain symbol, except for the root of the tree which represents nothing, or the beginning of the string.

The structure

We need three fields for each node of the Trie:

  • a pointer to the father node
  • a fixed length array of sons (in this case, exactly 26)
  • a boolean to tell us if a given node represents the end of a word

Notice that we don’t need a variable to hold the symbol, because we can extract it from the index of the node in its father’s array.

typedef struct _trie {
    struct _trie *father;
    struct _trie* sons[26];
    bool end_of_word;
} TrieNode;

The behavior

In the following functions, i intentionally omit NULL checks on the input and the return of malloc to make the code shorter and more readable.

init

This functions creates a new empty node with no sons.

TrieNode* init() {
    TrieNode* node = (TrieNode*) malloc(sizeof(TrieNode));
    for (int i=0; i<26; i++) {
        node->sons[i] = NULL;
    }
    node->father = NULL;
    node->end_of_word = false;
    return node;
}

insert

This function adds the given string str to the Trie rooted at the given root. If str is already present, no changes are made to the Trie.

void insert(TrieNode *root, char* str) {
    TrieNode *tmp = root;
    const int len = strlen(str);
    for (int i=0; i<len; i++) {
        const int index = str[i] - 'a';
        if (tmp->sons[index] == NULL) {
            tmp->sons[index] = init();
            tmp->sons[index]->father = tmp;
        }
        tmp = tmp->sons[index];
    }
    tmp->end_of_word = true;
}

This function looks for the given string str in the Trie rooted at the given root. It returns true if str was present, false otherwise.

bool search(TrieNode *root, char* str) {
    TrieNode *tmp = root;
    const int len = strlen(str);
    int i = 0;
    while (tmp != NULL && i < len) {
        tmp = tmp->sons[str[i] - 'a'];
        i++;
    }
    return tmp != NULL && tmp->end_of_word;
}

delete

This function deletes the given string str from the Trie rooted at the given root. If str was not present, no changes are made to the Trie. The deletion of a string from the Trie is actually setting the boolean end_of_word to false. In this implementation, however, if the “deleted” node has no sons, it is deallocated and this procedure continues on all father nodes until one with more than 1 son (or the root) is found.

void trie_delete (TrieNode *root) {
    if (root == NULL) return;
    for(int i=0; i<26; i++) {
        if( root->sons[i] != NULL ) trie_delete(root->sons[i]);
    }
    free(root);
}

NOTE: A personal implementation of the Trie can be found on my GitHub.


About Filippo Barbari

HPC Technology Specialist @ CINECA, Italy. Algorithms and Data Structures enthusiast. I love esoteric languages, board games and parallel simulations.

Star
Useful Links