Design Add and Search Words Data Structure

Updated on 22 May, 2025
Design Add and Search Words Data Structure header image

Problem Statement

The task is to design a versatile data structure named WordDictionary that serves two primary functions: storing words and subsequently checking for matches against previously stored words based on exact characters or patterns. The WordDictionary class should include the following functionalities:

  1. WordDictionary(): A constructor that initializes a new instance of the WordDictionary class.
  2. addWord(word): A method to add a word to the data structure so that it can later be matched.
  3. search(word): A method that checks if a specific pattern or a word, containing possible dots ('.'), matches any of the previously added words. The dot character serves as a wildcard that can represent any single letter.

This implementation must efficiently accommodate numerous addWord and search operations, with unique queries that may include wildcard characters to represent varied unknown positions within a word.

Examples

Example 1

Input:

["WordDictionary","addWord","addWord","addWord","search","search","search","search"]
[[],["bad"],["dad"],["mad"],["pad"],["bad"],[".ad"],["b.."]]

Output:

[null,null,null,null,false,true,true,true]

Explanation:

WordDictionary wordDictionary = new WordDictionary();
wordDictionary.addWord("bad");
wordDictionary.addWord("dad");
wordDictionary.addWord("mad");
wordDictionary.search("pad"); // return False
wordDictionary.search("bad"); // return True
wordDictionary.search(".ad"); // return True
wordDictionary.search("b.."); // return True

Constraints

  • 1 <= word.length <= 25
  • word in addWord consists of lowercase English letters.
  • word in search consist of '.' or lowercase English letters.
  • There will be at most 2 dots in word for search queries.
  • At most 104 calls will be made to addWord and search.

Approach and Intuition

Given the requirements of the WordDictionary class, where we need to search for patterns that may contain wildcards, a Trie (Prefix Tree) is a suitable data structure because it allows us to efficiently store string and retrieve them with specific patterns. Here's a breakdown of the thought process and approach:

  1. Data Structure Design:

    • Construct a Trie, which is an efficient information retrieval structure for strings. Each node represents a single character and stores references to descendant nodes.
  2. Implementation of addWord:

    • Traverse the Trie node by node from the root, create new nodes if necessary for each character of the input word, and ensure each completed word is marked at the end node.
  3. Implementation of search:

    • Traverse the Trie, but allow for flexibility where dots ('.') occur. For dots, recursively try all possible children nodes.
    • Implement the search as a recursive function that can handle both the presence of exact characters and wildcards (dots).
  4. Optimization Considerations:

    • Since a word during search can only have at most 2 dots based on the constraints, limit the recursive depth for handling wildcard scenarios to avoid excessive computations.
    • Use iterative approaches with data structures like stacks or queues for non-wildcard segments of the word to limit recursive calls and overhead.

By maintaining the Trie data structure and implementing smart search methods that capably handle wildcard characters, the WordDictionary class can efficiently fulfill its required operations within given constraints.

Solutions

  • C++
  • Java
  • Python
cpp
struct TrieNode {
    unordered_map<char, TrieNode*> next;
    bool isEndOfWord = false;
};

class WordDictionary {
    TrieNode* root;

public:
    WordDictionary() { root = new TrieNode(); }

    void addWord(string word) {
        TrieNode* current = root;

        for (char ch : word) {
            if (!current->next.count(ch)) {
                current->next[ch] = new TrieNode();
            }
            current = current->next[ch];
        }
        current->isEndOfWord = true;
    }

    bool deepSearch(string word, TrieNode* currentNode) {
        for (int i = 0; i < word.size(); ++i) {
            char ch = word[i];
            if (!currentNode->next.count(ch)) {
                if (ch == '.') {
                    for (auto pair : currentNode->next) {
                        TrieNode* childNode = pair.second;
                        if (deepSearch(word.substr(i + 1), childNode)) {
                            return true;
                        }
                    }
                }
                return false;
            } else {
                currentNode = currentNode->next[ch];
            }
        }
        return currentNode->isEndOfWord;
    }

    bool search(string word) { return deepSearch(word, root); }
};

Implement a data structure called WordDictionary that supports adding and searching words using C++. This data structure particularly supports searching of words with specific wildcards denoted by the character '.' which can match any character.

  • Implement the WordDictionary using a Trie (prefix tree) to store words.
  • Create a structure TrieNode which consists of:
    • A hashmap to hold child nodes.
    • A boolean to flag the end of a word within the Trie.
  • Initialize the WordDictionary with a root node that represents an empty Trie.

Adding a Word

  1. Start from the root.
  2. Iterate over each character of the word to be added.
  3. Check if the current character exists in the children of the current TrieNode:
    • If not present, create a new TrieNode and link it.
    • Move to the next node.
  4. After inserting all characters of the word, mark the last node as the end of the word.

Searching for a Word

  1. Use a helper function deepSearch which is recursively called to evaluate the word against the Trie structure.
  2. For each character in the word:
    • If the character is '.', recursively search in all possible child paths.
    • If the character is not found in the current node and is not '.', return false.
    • If a match is found, move to the next character.
  3. Return true if the end of the string corresponds to isEndOfWord flag of a TrieNode.

This structure efficiently manages dynamic dictionary operations and wildcard-supported searches.

java
class TrieNode {
    Map<Character, TrieNode> childrenMap = new HashMap<>();
    boolean endOfWord = false;

    public TrieNode() {}
}

class WordDictionary {
    TrieNode root;

    /** Initialize your data structure here. */
    public WordDictionary() {
        root = new TrieNode();
    }

    /** Adds a word into the data structure. */
    public void addWord(String word) {
        TrieNode currentNode = root;

        for (char ch : word.toCharArray()) {
            if (!currentNode.childrenMap.containsKey(ch)) {
                currentNode.childrenMap.put(ch, new TrieNode());
            }
            currentNode = currentNode.childrenMap.get(ch);
        }
        currentNode.endOfWord = true;
    }

    /** Returns if the word is in the node. */
    public boolean searchInNode(String word, TrieNode node) {
        for (int i = 0; i < word.length(); i++) {
            char ch = word.charAt(i);
            if (!node.childrenMap.containsKey(ch)) {
                if (ch == '.') {
                    for (char key : node.childrenMap.keySet()) {
                        TrieNode childNode = node.childrenMap.get(key);
                        if (searchInNode(word.substring(i + 1), childNode)) {
                            return true;
                        }
                    }
                }
                return false;
            } else {
                node = node.childrenMap.get(ch);
            }
        }
        return node.endOfWord;
    }

    /** Returns if the word is in the data structure. A word could contain the dot character '.' to represent any one letter. */
    public boolean search(String word) {
        return searchInNode(word, root);
    }
}

The provided Java code demonstrates the construction and operation of a specialized data structure using a Trie (often called a prefix tree) to store and search for words with efficient time complexity. The core functionalities of this structure are encapsulated in two main classes: TrieNode and WordDictionary.

  • TrieNode class:

    • It contains a HashMap childrenMap to map each character to its subsequent TrieNode.
    • It also has a boolean endOfWord to denote the end of a word.
  • WordDictionary class:

    1. Initializes an empty root node of TrieNode.
    2. addWord(String word): Implements the addition of a word to the dictionary. It traverses each character of the word, checks its existence in the current node's childrenMap, and if not present, creates a new TrieNode. The node corresponding to the last character is marked with endOfWord = true to signify the completion of the word.
    3. search(String word): Facilitates the search for a word, including support for wildcard characters represented by '.'. It invokes searchInNode(String word, TrieNode node) which is recursive.
    4. searchInNode(String word, TrieNode node): Recursively checks each character of the word in the trie. If a character is '.', it iterates over all possible child nodes. For a typical character, it moves to the corresponding child node if exists; otherwise, the search returns false. If the end of the word is reached within the Trie and endOfWord is true, it confirms the word's existence in the dictionary.

This code effectively addresses the challenges of designing a data structure that supports both the efficient addition of words and their flexible search, including the use of wildcards, thereby demonstrating a fundamental application of Trie data structures for tasks involving pattern matching and word lookups.

python
class Lexicon:

    def __init__(self):
        self.root = {}

    def insertWord(self, word: str) -> None:
        current = self.root
        for letter in word:
            if letter not in current:
                current[letter] = {}
            current = current[letter]
        current["*"] = True

    def find(self, word: str) -> bool:

        def dfs_search(part, node) -> bool:
            for index, char in enumerate(part):
                if char not in node:
                    if char == '.':
                        for k in node:
                            if k != '*' and dfs_search(part[index + 1:], node[k]):
                                return True
                    return False
                else:
                    node = node[char]
            return '*' in node

        return dfs_search(word, self.root)

The solution provided implements a data structure that allows adding and searching words with support for wildcard characters. The implementation is in Python and employs a Trie (prefix tree) structure for efficient storage and search. Here's a summary of how the code works:

  • Initialization (__init__ method):

    • Start by creating a root dictionary. This dictionary serves as the starting point of the Trie.
  • Inserting a word (insertWord method):

    • Traverse the structure character by character from the root. If a character does not exist at any node, create a new dictionary at that node.
    • Once the end of the word is reached, mark this end in the Trie with a special character "*" set to True to indicate the end of a valid word.
  • Searching a word (find method):

    • Implement a recursive function dfs_search to delve into the Trie. It handles two scenarios:
      • For regular characters, follow the path directly in the Trie.
      • For the wildcard character '.', iterate through all possible nodes at the current Trie path. Recursively search from the next part of the word.
    • The base case for the recursion checks if the end of the part is matched in the Trie through the presence of the "*" in the current node.

The design supports use of wildcard elements represented by '.', which makes it versatile for matching patterns within added words, not just exact word matches. This data structure is efficient for operations like search suggestions and spell-checkers where pattern matching is crucial.

Comments

No comments yet.