Design Search Autocomplete System

Updated on 27 May, 2025
Design Search Autocomplete System header image

Problem Statement

Design and implement an autocomplete system for a search engine where users type a sentence that is a sequence of words ending with a special character '#'. The system leverages historical user input (stored in a string array sentences) along with the frequency of each sentence (stored in an integer array times), using both to predict and generate up to three autocomplete suggestions every time a user types a character. These suggestions are dynamically computed based on the prefix of the currently typed sentence so far.

The suggestions are prioritized based on two criteria:

  1. The frequency or "hot degree" of previously typed sentences, with more frequent sentences appearing first.
  2. If the frequencies are tied, the sentences are then sorted lexicographically based on ASCII values.

Each input character influences the autocomplete suggestions, except for the '#' character, which signifies the end of a sentence. Upon encountering this character, the system terminates the current suggestion session, stores the sentence, and prepares for a new session. The autocomplete needs to handle input reasonably fast despite potential high usage volume, making up to 5000 calls.

Examples

Example 1

Input

["AutocompleteSystem", "input", "input", "input", "input"]
[[["i love you", "island", "iroman", "i love vultr"], [5, 3, 2, 2]], ["i"], [" "], ["a"], ["#"]]

Output

[null, ["i love you", "island", "i love vultr"], ["i love you", "i love vultr"], [], []]

Explanation

AutocompleteSystem obj = new AutocompleteSystem(["i love you", "island", "iroman", "i love vultr"], [5, 3, 2, 2]);
obj.input("i"); // return ["i love you", "island", "i love vultr"]. There are four sentences that have prefix "i". Among them, "ironman" and "i love vultr" have same hot degree. Since ' ' has ASCII code 32 and 'r' has ASCII code 114, "i love vultr" should be in front of "ironman". Also we only need to output top 3 hot sentences, so "ironman" will be ignored.
obj.input(" "); // return ["i love you", "i love vultr"]. There are only two sentences that have prefix "i ".
obj.input("a"); // return []. There are no sentences that have prefix "i a".
obj.input("#"); // return []. The user finished the input, the sentence "i a" should be saved as a historical sentence in system. And the following input will be counted as a new search.

Constraints

  • n == sentences.length
  • n == times.length
  • 1 <= n <= 100
  • 1 <= sentences[i].length <= 100
  • 1 <= times[i] <= 50
  • c is a lowercase English letter, a hash '#', or space ' '.
  • Each tested sentence will be a sequence of characters c that end with the character '#'.
  • Each tested sentence will have a length in the range [1, 200].
  • The words in each input sentence are separated by single spaces.
  • At most 5000 calls will be made to input.

Approach and Intuition

The core functionality of the AutocompleteSystem is two-fold:

  1. Manage and update a repository of sentences with their corresponding frequencies.
  2. Provide a fast lookup for the top suggestions based on the current input prefix.

Here's how the system could logically handle the tasks:

  1. Initialization: Implement the constructor AutocompleteSystem(String[] sentences, int[] times) which initializes the system with pre-existing data. This can involve populating some sort of data structure, often a trie (prefix tree), which allows efficient prefix-based search operations and can also handle the frequency data.

  2. Input Handling: Use the input(char c) method to handle real-time user input:

    • If the character is not '#', append it to a current input buffer.
    • Search the trie for nodes that match the current input buffer's prefix.
    • Sort the nodes based first on frequency (in descending order) and then lexicographically if frequencies match. Only the top three matches (or fewer if not enough matches exist) are selected for output.
    • If the character is '#', store the currently formed sentence from the input buffer (adding or updating it in the trie with incremented frequency), reset the input buffer, and prepare for a new session (output an empty list).

This approach, largely centered around trie operations and efficient sorting mechanisms, ensures that prefix-based searches are quick and the most frequently occurring sentences are prioritized, all while managing updates from new or repeated entries effectively.

Solutions

  • Java
  • Python
java
class TrieNode {
    Map<Character, TrieNode> next;
    Map<String, Integer> counts;
    public TrieNode() {
        next = new HashMap<>();
        counts = new HashMap<>();
    }
}

class SuggestionsSystem {
    TrieNode base;
    TrieNode current;
    TrieNode failed;
    StringBuilder activeString;
    
    public SuggestionsSystem(String[] phrases, int[] weights) {
        base = new TrieNode();
        for (int i = 0; i < phrases.length; i++) {
            insertInTrie(phrases[i], weights[i]);
        }
        
        activeString = new StringBuilder();
        current = base;
        failed = new TrieNode();
    }
    
    public List<String> input(char c) {
        if (c == '#') {
            insertInTrie(activeString.toString(), 1);
            activeString.setLength(0);
            current = base;
            return new ArrayList<String>();
        }
        
        activeString.append(c);
        if (!current.next.containsKey(c)) {
            current = failed;
            return new ArrayList<String>();
        }
        
        current = current.next.get(c);
        PriorityQueue<String> heap = new PriorityQueue<>((a, b) -> {
            int weightA = current.counts.get(a);
            int weightB = current.counts.get(b);
            if (weightA == weightB) {
                return b.compareTo(a);
            }
            
            return weightA - weightB;
        });
        
        for (String term: current.counts.keySet()) {
            heap.add(term);
            if (heap.size() > 3) {
                heap.poll();
            }
        }
        
        List<String> results = new ArrayList<>();
        while (!heap.isEmpty()) {
            results.add(heap.poll());
        }
        
        Collections.reverse(results);
        return results;
    }
    
    private void insertInTrie(String phrase, int increment) {
        TrieNode node = base;
        for (char ch: phrase.toCharArray()) {
            if (!node.next.containsKey(ch)) {
                node.next.put(ch, new TrieNode());
            }
            
            node = node.next.get(ch);
            node.counts.put(phrase, node.counts.getOrDefault(phrase, 0) + increment);
        }
    }
}

This Java solution implements an autocomplete system using a Trie data structure, often used for efficient storage and retrieval of string-based data. The designed autocomplete system supports dynamic inputs and priorities based on usage frequency.

The main components of the system include:

  • TrieNode class:

    • next - A HashMap to store the next characters in the trie.
    • counts - A HashMap to keep track of the number of times each phrase ends at this node.
  • SuggestionsSystem class:

    • TrieNode base - The starting point of the trie.
    • TrieNode current and TrieNode failed - Nodes to manage the current state of traversal and the state when the next character is absent.
    • StringBuilder activeString - To accumulate characters as they are typed.

The constructor public SuggestionsSystem(String[] phrases, int[] weights) initializes the trie with predefined phrases and their weights (usage frequencies). The method private void insertInTrie(String phrase, int increment) is used to insert new phrases into the trie and update their weights.

The main functionality is in the method public List<String> input(char c), which processes each character input by the user:

  1. If the character c is '#', it indicates the completion of a word. Insert the word into the trie, reset activeString, and set the current node back to base.
  2. If the character is not '#', append it to activeString. Check whether current has a next node for the character:
    • If yes, move to the next node.
    • If no, move to the failed node.

A priority queue (min-heap) then sorts phrases by their frequencies and lexicographical order. Only the top three suggestions are kept in the heap. Extract the suggestions, reverse their order because they are extracted in reverse order, and return the list.

This implementation provides a fast and efficient way to suggest the top phrases based on partial inputs, updating in real-time based on user input frequency, and delivering suggestions prioritized by their relevance and frequency of use.

python
import heapq

class Node:
    def __init__(self):
        self.kids = {}
        self.data = defaultdict(int)

class SuggestSystem:
    def __init__(self, sentences: List[str], times: List[int]):
        self.base = Node()
        for line, cnt in zip(sentences, times):
            self.insert(line, cnt)
        
        self.active_sentence = []
        self.active_node = self.base
        self.dead_end = Node()
        
    def input(self, char: str) -> List[str]:
        if char == "#":
            completed = "".join(self.active_sentence)
            self.insert(completed, 1)
            self.active_sentence = []
            self.active_node = self.base
            return []
        
        self.active_sentence.append(char)
        if char not in self.active_node.kids:
            self.active_node = self.dead_end
            return []
        
        self.active_node = self.active_node.kids[char]
        sent = [(freq, sent) for sent, freq in self.active_node.data.items()]
        top_suggestions = heapq.nsmallest(3, sent)
        return [item[1] for item in top_suggestions]

    def insert(self, sentence, value):
        node = self.base
        for char in sentence:
            if char not in node.kids:
                node.kids[char] = Node()
            node = node.kids[char]
            node.data[sentence] -= value

The Python3 code provided introduces a complex autocomplete system for input sentences. This system leverages data structures and algorithms to efficiently provide real-time autocomplete suggestions.

In this implementation:

  • Node class: Represents a node in a trie (prefix tree). Each node stores child nodes (kids dictionary) and a defaultdict (data) for tracking sentences and their respective frequencies of occurrence.

  • SuggestSystem class: The main class managing the autocomplete logic:

    • Initializes with existing sentences and their counts, building an initial trie.
    • Handles character input, updates the trie, and provides top suggestions.

Key methods include:

  1. __init__: Constructs the trie using the provided sentences and their respective initial counts.
  2. input: Processes an input character:
    • Completes the current sentence if "#" is encountered, updating the trie and resetting state for new input.
    • Otherwise, updates the ongoing sentence and trie navigation, providing up to three top suggestions based on frequency using a min-heap.
  3. insert: Adds or updates a sentence in the trie, modifying nodes and data values to reflect new or incremental counts.

Each method collaborates to enable dynamic updates to suggestions based on user input character sequences, ensuring responsiveness and accuracy in suggestions. The trie structure, combined with a min-heap for retrieving top recommendations, ensures the system is both space and time-efficient.

Comments

No comments yet.