Prefix and Suffix Search

Updated on 23 June, 2025
Prefix and Suffix Search header image

Problem Statement

In this task, we are required to design a special dictionary class called WordFilter. This class allows us to efficiently search for words by both a specified prefix and a suffix.

The WordFilter class will function as follows:

  • Constructor - WordFilter(string[] words): Upon initialization, this constructor takes an array of strings denoted as words and stores them in the dictionary.

  • Search Method - f(string pref, string suff): This function searches for the index of a word within the dictionary that starts with the prefix pref and ends with the suffix suff. If multiple words match the criteria, the index of the last (or largest) matching word is returned. If no matching word is found, the method returns -1.

By implementing this design, it aims to provide a fast and reliable way to find words based on both their starting and ending characters.

Examples

Example 1

Input:

["WordFilter", "f"]
[[["apple"]], ["a", "e"]]

Output:

[null, 0]

Explanation:

WordFilter wordFilter = new WordFilter(["apple"]);
wordFilter.f("a", "e"); // return 0, because the word at index 0 has prefix = "a" and suffix = "e".

Constraints

  • 1 <= words.length <= 10^4
  • 1 <= words[i].length <= 7
  • 1 <= pref.length, suff.length <= 7
  • words[i], pref and suff consist of lowercase English letters only.
  • At most 10^4 calls will be made to the function f.

Approach and Intuition

To tackle this problem, we need to efficiently manage and query a collection of words where searches are based on the prefix and the suffix of the words. Here’s how we could conceptualize our approach:

  1. Storing Words: The dictionary must store each word such that both its prefix and suffix can be quickly accessed and matched against the search queries. One feasible way to achieve this is to store every word along with its reversed counterpart or manage an indexing system based on prefix and suffix combinations directly.

  2. Searching for Matching Words: For searching, an efficient strategy is using hash tables (dictionaries in Python) where combined strings of prefix and suffix point to their respective word indices. Building on that, one could maintain a dictionary where keys consist of tuples (prefix, suffix) and values are the largest index where such a word appears.

  3. Handling Queries: Given the constraints, with words of relatively short length (up to 7 characters) but potentially large numbers of them (up to 10,000), and a high count of possible query calls (also up to 10,000), ensuring the lookup operations are as close to constant time as possible can drastically improve performance. For every f(pref, suff) call, the method could quickly derive the key from pref and suff, look it up in the dictionary, and return the corresponding index or -1 if no such key exists.

By focusing on optimizing the storage and retrieval of index data based on both prefix and suffix, we can handle large numbers of words and search queries efficiently. This combination of data structuring and strategic indexing encapsulates the main approach to implementing the WordFilter class.

Solutions

  • Java
  • Python
java
class PrefixSuffixFilter {
    TrieNode root;
    public PrefixSuffixFilter(String[] words) {
        root = new TrieNode();
        for (int index = 0; index < words.length; index++) {
            String modifiedWord = words[index] + "{";
            for (int i = 0; i < modifiedWord.length(); i++) {
                TrieNode node = root;
                node.weight = index;
                for (int j = i; j < 2 * modifiedWord.length() - 1; j++) {
                    int charIndex = modifiedWord.charAt(j % modifiedWord.length()) - 'a';
                    if (node.branches[charIndex] == null) {
                        node.branches[charIndex] = new TrieNode();
                    }
                    node = node.branches[charIndex];
                    node.weight = index;
                }
            }
        }
    }
    public int search(String pref, String suff) {
        TrieNode node = root;
        for (char ch: (suff + '{' + pref).toCharArray()) {
            if (node.branches[ch - 'a'] == null) {
                return -1;
            }
            node = node.branches[ch - 'a'];
        }
        return node.weight;
    }
}

class TrieNode {
    TrieNode[] branches;
    int weight;
    public TrieNode() {
        branches = new TrieNode[27];
        weight = 0;
    }
}

The provided Java solution focuses on building a robust system for filtering words based on both prefix and suffix using a custom Trie (prefix tree) data structure. The system is suitable to address efficient searching which is critical in applications such as autocomplete features and spell-checkers.

  • The key elements of the solution involve the PrefixSuffixFilter and TrieNode classes.

  • In the PrefixSuffixFilter class:

    • A TrieNode named root is instantiated to serve as the starting point of the trie.
    • The constructor takes an array of words and processes each word by appending a special character { to help distinguish between the end of a suffix and the start of a prefix in the trie.
    • Each word is then iteratively added to the trie in a way that every possible suffix-prefix combination of the word is preserved as a unique path in the trie, indexed by the position of the word in the original array to facilitate fast lookups.
  • The search method allows checking the highest index of words that contain a given prefix after a given suffix. This is done by:

    • Concatenating the suffix with the { character and the prefix, forming a single string which represents a potential path in the trie.
    • Navigating through the trie according to this path and returning the stored index which represents the most recent addition of a word that matches the prefix and suffix criteria.
  • The TrieNode class is structured as:

    • An array of TrieNode references named branches, sized to accommodate the English lowercase letters and the special character {.
    • An integer weight that holds the index of the last word added along this path in the trie.

This design ensures that the search operation is performed in a time-efficient manner, since each query operates directly through the trie structure without the need for scanning or matching operations on the list of words. The trie effectively pre-processes and stores the data in a way that anticipates and optimizes for the search use case, making the search operation quick and scalable.

python
CreateTrie = lambda: collections.defaultdict(CreateTrie)
MAX_WEIGHT = False

class PrefixSuffixSearch:
    def __init__(self, words: List[str]):
        self.root = CreateTrie()

        for index, word in enumerate(words):
            modified_word = word + '#'
            for pos in range(len(modified_word)):
                node = self.root
                node[MAX_WEIGHT] = index
                for offset in range(pos, 2 * len(modified_word) - 1):
                    node = node[modified_word[offset % len(modified_word)]]
                    node[MAX_WEIGHT] = index

    def search(self, prefix: str, suffix: str) -> int:
        node = self.root
        for char in suffix + '#' + prefix:
            if char not in node:
                return -1
            node = node[char]
        return node[MAX_WEIGHT]

This Python solution focuses on implementing the PrefixSuffixSearch class to handle search queries that combine both a given prefix and a suffix in a list of words. The key data structure used here is a modified Trie (prefix tree).

  • First, a Trie is created using a lambda function that recursively uses collections.defaultdict to instantiate new Tries, ensuring that every node in the Trie can dynamically create new child nodes if they do not already exist.
  • The MAX_WEIGHT constant is used as a key in the Trie to store the maximum index of words that have been processed, facilitating the return of the most recent addition that matches the search criteria.

Detailed Steps in the Code:

  1. Initialization (__init__ method):

    • Initializes the Trie.
    • Iterates through each word, modifying it by appending a '#' to help distinguish between the end of a suffix and the start of a prefix.
    • Inserts each possible combination of prefixes and suffixes of the word into the Trie. This includes rotating through the modified word and adding each rotation to the Trie, updating the maximum index each time.
  2. Search Method:

    • Searches for the concatenation of the suffix, '#' and the prefix in the Trie.
    • If any character in this combined string is missing from the Trie during the traversal, it returns -1 indicating the combined prefix and suffix is not found.
    • If the traversal is successful, returns the stored MAX_WEIGHT at the terminal node, which represents the highest index of a word in the initial list that matches the search criteria.

This method ensures efficient querying, as it preprocesses the words to allow rapid searches for any given combination of prefix and suffix. This preprocessing step helps in reducing the complexity during the actual search, making the query operation faster especially beneficial when there are multiple queries to be performed on an infrequently changed list of words. The use of a Trie particularly aids in optimizing the space used, as common prefixes and suffixes are stored only once.

Comments

No comments yet.