
Problem Statement
Design and implement an autocomplete system for a search engine where users type a sentence that is a sequence of words ending with a special character '#'
. The system leverages historical user input (stored in a string array sentences
) along with the frequency of each sentence (stored in an integer array times
), using both to predict and generate up to three autocomplete suggestions every time a user types a character. These suggestions are dynamically computed based on the prefix of the currently typed sentence so far.
The suggestions are prioritized based on two criteria:
- The frequency or "hot degree" of previously typed sentences, with more frequent sentences appearing first.
- If the frequencies are tied, the sentences are then sorted lexicographically based on ASCII values.
Each input character influences the autocomplete suggestions, except for the '#'
character, which signifies the end of a sentence. Upon encountering this character, the system terminates the current suggestion session, stores the sentence, and prepares for a new session. The autocomplete needs to handle input reasonably fast despite potential high usage volume, making up to 5000 calls.
Examples
Example 1
Input
["AutocompleteSystem", "input", "input", "input", "input"] [[["i love you", "island", "iroman", "i love vultr"], [5, 3, 2, 2]], ["i"], [" "], ["a"], ["#"]]
Output
[null, ["i love you", "island", "i love vultr"], ["i love you", "i love vultr"], [], []]
Explanation
AutocompleteSystem obj = new AutocompleteSystem(["i love you", "island", "iroman", "i love vultr"], [5, 3, 2, 2]); obj.input("i"); // return ["i love you", "island", "i love vultr"]. There are four sentences that have prefix "i". Among them, "ironman" and "i love vultr" have same hot degree. Since ' ' has ASCII code 32 and 'r' has ASCII code 114, "i love vultr" should be in front of "ironman". Also we only need to output top 3 hot sentences, so "ironman" will be ignored. obj.input(" "); // return ["i love you", "i love vultr"]. There are only two sentences that have prefix "i ". obj.input("a"); // return []. There are no sentences that have prefix "i a". obj.input("#"); // return []. The user finished the input, the sentence "i a" should be saved as a historical sentence in system. And the following input will be counted as a new search.
Constraints
n == sentences.length
n == times.length
1 <= n <= 100
1 <= sentences[i].length <= 100
1 <= times[i] <= 50
c
is a lowercase English letter, a hash'#'
, or space' '
.- Each tested sentence will be a sequence of characters
c
that end with the character'#'
. - Each tested sentence will have a length in the range
[1, 200]
. - The words in each input sentence are separated by single spaces.
- At most
5000
calls will be made toinput
.
Approach and Intuition
The core functionality of the AutocompleteSystem
is two-fold:
- Manage and update a repository of sentences with their corresponding frequencies.
- Provide a fast lookup for the top suggestions based on the current input prefix.
Here's how the system could logically handle the tasks:
Initialization: Implement the constructor
AutocompleteSystem(String[] sentences, int[] times)
which initializes the system with pre-existing data. This can involve populating some sort of data structure, often a trie (prefix tree), which allows efficient prefix-based search operations and can also handle the frequency data.Input Handling: Use the
input(char c)
method to handle real-time user input:- If the character is not
'#'
, append it to a current input buffer. - Search the trie for nodes that match the current input buffer's prefix.
- Sort the nodes based first on frequency (in descending order) and then lexicographically if frequencies match. Only the top three matches (or fewer if not enough matches exist) are selected for output.
- If the character is
'#'
, store the currently formed sentence from the input buffer (adding or updating it in the trie with incremented frequency), reset the input buffer, and prepare for a new session (output an empty list).
- If the character is not
This approach, largely centered around trie operations and efficient sorting mechanisms, ensures that prefix-based searches are quick and the most frequently occurring sentences are prioritized, all while managing updates from new or repeated entries effectively.
Solutions
- Java
- Python
class TrieNode {
Map<Character, TrieNode> next;
Map<String, Integer> counts;
public TrieNode() {
next = new HashMap<>();
counts = new HashMap<>();
}
}
class SuggestionsSystem {
TrieNode base;
TrieNode current;
TrieNode failed;
StringBuilder activeString;
public SuggestionsSystem(String[] phrases, int[] weights) {
base = new TrieNode();
for (int i = 0; i < phrases.length; i++) {
insertInTrie(phrases[i], weights[i]);
}
activeString = new StringBuilder();
current = base;
failed = new TrieNode();
}
public List<String> input(char c) {
if (c == '#') {
insertInTrie(activeString.toString(), 1);
activeString.setLength(0);
current = base;
return new ArrayList<String>();
}
activeString.append(c);
if (!current.next.containsKey(c)) {
current = failed;
return new ArrayList<String>();
}
current = current.next.get(c);
PriorityQueue<String> heap = new PriorityQueue<>((a, b) -> {
int weightA = current.counts.get(a);
int weightB = current.counts.get(b);
if (weightA == weightB) {
return b.compareTo(a);
}
return weightA - weightB;
});
for (String term: current.counts.keySet()) {
heap.add(term);
if (heap.size() > 3) {
heap.poll();
}
}
List<String> results = new ArrayList<>();
while (!heap.isEmpty()) {
results.add(heap.poll());
}
Collections.reverse(results);
return results;
}
private void insertInTrie(String phrase, int increment) {
TrieNode node = base;
for (char ch: phrase.toCharArray()) {
if (!node.next.containsKey(ch)) {
node.next.put(ch, new TrieNode());
}
node = node.next.get(ch);
node.counts.put(phrase, node.counts.getOrDefault(phrase, 0) + increment);
}
}
}
This Java solution implements an autocomplete system using a Trie data structure, often used for efficient storage and retrieval of string-based data. The designed autocomplete system supports dynamic inputs and priorities based on usage frequency.
The main components of the system include:
TrieNode
class:next
- A HashMap to store the next characters in the trie.counts
- A HashMap to keep track of the number of times each phrase ends at this node.
SuggestionsSystem
class:TrieNode base
- The starting point of the trie.TrieNode current
andTrieNode failed
- Nodes to manage the current state of traversal and the state when the next character is absent.StringBuilder activeString
- To accumulate characters as they are typed.
The constructor public SuggestionsSystem(String[] phrases, int[] weights)
initializes the trie with predefined phrases and their weights (usage frequencies). The method private void insertInTrie(String phrase, int increment)
is used to insert new phrases into the trie and update their weights.
The main functionality is in the method public List<String> input(char c)
, which processes each character input by the user:
- If the character
c
is '#', it indicates the completion of a word. Insert the word into the trie, resetactiveString
, and set thecurrent
node back to base. - If the character is not '#', append it to
activeString
. Check whethercurrent
has a next node for the character:- If yes, move to the next node.
- If no, move to the
failed
node.
A priority queue (min-heap) then sorts phrases by their frequencies and lexicographical order. Only the top three suggestions are kept in the heap. Extract the suggestions, reverse their order because they are extracted in reverse order, and return the list.
This implementation provides a fast and efficient way to suggest the top phrases based on partial inputs, updating in real-time based on user input frequency, and delivering suggestions prioritized by their relevance and frequency of use.
import heapq
class Node:
def __init__(self):
self.kids = {}
self.data = defaultdict(int)
class SuggestSystem:
def __init__(self, sentences: List[str], times: List[int]):
self.base = Node()
for line, cnt in zip(sentences, times):
self.insert(line, cnt)
self.active_sentence = []
self.active_node = self.base
self.dead_end = Node()
def input(self, char: str) -> List[str]:
if char == "#":
completed = "".join(self.active_sentence)
self.insert(completed, 1)
self.active_sentence = []
self.active_node = self.base
return []
self.active_sentence.append(char)
if char not in self.active_node.kids:
self.active_node = self.dead_end
return []
self.active_node = self.active_node.kids[char]
sent = [(freq, sent) for sent, freq in self.active_node.data.items()]
top_suggestions = heapq.nsmallest(3, sent)
return [item[1] for item in top_suggestions]
def insert(self, sentence, value):
node = self.base
for char in sentence:
if char not in node.kids:
node.kids[char] = Node()
node = node.kids[char]
node.data[sentence] -= value
The Python3 code provided introduces a complex autocomplete system for input sentences. This system leverages data structures and algorithms to efficiently provide real-time autocomplete suggestions.
In this implementation:
Node class: Represents a node in a trie (prefix tree). Each node stores child nodes (
kids
dictionary) and a defaultdict (data
) for tracking sentences and their respective frequencies of occurrence.SuggestSystem class: The main class managing the autocomplete logic:
- Initializes with existing sentences and their counts, building an initial trie.
- Handles character input, updates the trie, and provides top suggestions.
Key methods include:
__init__
: Constructs the trie using the provided sentences and their respective initial counts.input
: Processes an input character:- Completes the current sentence if "#" is encountered, updating the trie and resetting state for new input.
- Otherwise, updates the ongoing sentence and trie navigation, providing up to three top suggestions based on frequency using a min-heap.
insert
: Adds or updates a sentence in the trie, modifying nodes and data values to reflect new or incremental counts.
Each method collaborates to enable dynamic updates to suggestions based on user input character sequences, ensuring responsiveness and accuracy in suggestions. The trie structure, combined with a min-heap for retrieving top recommendations, ensures the system is both space and time-efficient.
No comments yet.