
Problem Statement
In this task, we are required to design a special dictionary class called WordFilter
. This class allows us to efficiently search for words by both a specified prefix and a suffix.
The WordFilter
class will function as follows:
Constructor -
WordFilter(string[] words)
: Upon initialization, this constructor takes an array of strings denoted aswords
and stores them in the dictionary.Search Method -
f(string pref, string suff)
: This function searches for the index of a word within the dictionary that starts with the prefixpref
and ends with the suffixsuff
. If multiple words match the criteria, the index of the last (or largest) matching word is returned. If no matching word is found, the method returns-1
.
By implementing this design, it aims to provide a fast and reliable way to find words based on both their starting and ending characters.
Examples
Example 1
Input:
["WordFilter", "f"]
[[["apple"]], ["a", "e"]]
Output:
[null, 0]
Explanation:
WordFilter wordFilter = new WordFilter(["apple"]); wordFilter.f("a", "e"); // return 0, because the word at index 0 has prefix = "a" and suffix = "e".
Constraints
1 <= words.length <= 10^4
1 <= words[i].length <= 7
1 <= pref.length, suff.length <= 7
words[i]
,pref
andsuff
consist of lowercase English letters only.- At most
10^4
calls will be made to the functionf
.
Approach and Intuition
To tackle this problem, we need to efficiently manage and query a collection of words where searches are based on the prefix and the suffix of the words. Here’s how we could conceptualize our approach:
Storing Words: The dictionary must store each word such that both its prefix and suffix can be quickly accessed and matched against the search queries. One feasible way to achieve this is to store every word along with its reversed counterpart or manage an indexing system based on prefix and suffix combinations directly.
Searching for Matching Words: For searching, an efficient strategy is using hash tables (dictionaries in Python) where combined strings of prefix and suffix point to their respective word indices. Building on that, one could maintain a dictionary where keys consist of tuples
(prefix, suffix)
and values are the largest index where such a word appears.Handling Queries: Given the constraints, with words of relatively short length (up to 7 characters) but potentially large numbers of them (up to 10,000), and a high count of possible query calls (also up to 10,000), ensuring the lookup operations are as close to constant time as possible can drastically improve performance. For every
f(pref, suff)
call, the method could quickly derive the key frompref
andsuff
, look it up in the dictionary, and return the corresponding index or-1
if no such key exists.
By focusing on optimizing the storage and retrieval of index data based on both prefix and suffix, we can handle large numbers of words and search queries efficiently. This combination of data structuring and strategic indexing encapsulates the main approach to implementing the WordFilter
class.
Solutions
- Java
- Python
class PrefixSuffixFilter {
TrieNode root;
public PrefixSuffixFilter(String[] words) {
root = new TrieNode();
for (int index = 0; index < words.length; index++) {
String modifiedWord = words[index] + "{";
for (int i = 0; i < modifiedWord.length(); i++) {
TrieNode node = root;
node.weight = index;
for (int j = i; j < 2 * modifiedWord.length() - 1; j++) {
int charIndex = modifiedWord.charAt(j % modifiedWord.length()) - 'a';
if (node.branches[charIndex] == null) {
node.branches[charIndex] = new TrieNode();
}
node = node.branches[charIndex];
node.weight = index;
}
}
}
}
public int search(String pref, String suff) {
TrieNode node = root;
for (char ch: (suff + '{' + pref).toCharArray()) {
if (node.branches[ch - 'a'] == null) {
return -1;
}
node = node.branches[ch - 'a'];
}
return node.weight;
}
}
class TrieNode {
TrieNode[] branches;
int weight;
public TrieNode() {
branches = new TrieNode[27];
weight = 0;
}
}
The provided Java solution focuses on building a robust system for filtering words based on both prefix and suffix using a custom Trie (prefix tree) data structure. The system is suitable to address efficient searching which is critical in applications such as autocomplete features and spell-checkers.
The key elements of the solution involve the
PrefixSuffixFilter
andTrieNode
classes.In the
PrefixSuffixFilter
class:- A TrieNode named
root
is instantiated to serve as the starting point of the trie. - The constructor takes an array of words and processes each word by appending a special character
{
to help distinguish between the end of a suffix and the start of a prefix in the trie. - Each word is then iteratively added to the trie in a way that every possible suffix-prefix combination of the word is preserved as a unique path in the trie, indexed by the position of the word in the original array to facilitate fast lookups.
- A TrieNode named
The
search
method allows checking the highest index of words that contain a given prefix after a given suffix. This is done by:- Concatenating the suffix with the
{
character and the prefix, forming a single string which represents a potential path in the trie. - Navigating through the trie according to this path and returning the stored index which represents the most recent addition of a word that matches the prefix and suffix criteria.
- Concatenating the suffix with the
The
TrieNode
class is structured as:- An array of TrieNode references named
branches
, sized to accommodate the English lowercase letters and the special character{
. - An integer
weight
that holds the index of the last word added along this path in the trie.
- An array of TrieNode references named
This design ensures that the search operation is performed in a time-efficient manner, since each query operates directly through the trie structure without the need for scanning or matching operations on the list of words. The trie effectively pre-processes and stores the data in a way that anticipates and optimizes for the search use case, making the search operation quick and scalable.
CreateTrie = lambda: collections.defaultdict(CreateTrie)
MAX_WEIGHT = False
class PrefixSuffixSearch:
def __init__(self, words: List[str]):
self.root = CreateTrie()
for index, word in enumerate(words):
modified_word = word + '#'
for pos in range(len(modified_word)):
node = self.root
node[MAX_WEIGHT] = index
for offset in range(pos, 2 * len(modified_word) - 1):
node = node[modified_word[offset % len(modified_word)]]
node[MAX_WEIGHT] = index
def search(self, prefix: str, suffix: str) -> int:
node = self.root
for char in suffix + '#' + prefix:
if char not in node:
return -1
node = node[char]
return node[MAX_WEIGHT]
This Python solution focuses on implementing the PrefixSuffixSearch class to handle search queries that combine both a given prefix and a suffix in a list of words. The key data structure used here is a modified Trie (prefix tree).
- First, a Trie is created using a lambda function that recursively uses
collections.defaultdict
to instantiate new Tries, ensuring that every node in the Trie can dynamically create new child nodes if they do not already exist. - The
MAX_WEIGHT
constant is used as a key in the Trie to store the maximum index of words that have been processed, facilitating the return of the most recent addition that matches the search criteria.
Detailed Steps in the Code:
Initialization (
__init__
method):- Initializes the Trie.
- Iterates through each word, modifying it by appending a '#' to help distinguish between the end of a suffix and the start of a prefix.
- Inserts each possible combination of prefixes and suffixes of the word into the Trie. This includes rotating through the modified word and adding each rotation to the Trie, updating the maximum index each time.
Search Method:
- Searches for the concatenation of the suffix, '#' and the prefix in the Trie.
- If any character in this combined string is missing from the Trie during the traversal, it returns
-1
indicating the combined prefix and suffix is not found. - If the traversal is successful, returns the stored
MAX_WEIGHT
at the terminal node, which represents the highest index of a word in the initial list that matches the search criteria.
This method ensures efficient querying, as it preprocesses the words to allow rapid searches for any given combination of prefix and suffix. This preprocessing step helps in reducing the complexity during the actual search, making the query operation faster especially beneficial when there are multiple queries to be performed on an infrequently changed list of words. The use of a Trie particularly aids in optimizing the space used, as common prefixes and suffixes are stored only once.
No comments yet.