Sum of Prefix Scores of Strings

Updated on 14 July, 2025
Sum of Prefix Scores of Strings header image

Problem Statement

In this problem, we are provided with an array of strings called words. Each string within words is non-empty and consists only of lowercase English letters. Our goal is to determine the 'influence' or 'impact' each string has through its prefixes on the entire array. Specifically, for each string, we are to count how many times each of its non-empty prefixes appears as a prefix in any of the strings in the array.

To clarify, the score of a specific prefix in a string is defined as the count of strings in the array that start with that prefix. For a given string words[i], we calculate the score for every possible non-empty prefix derived from words[i] and then sum these individual scores to get a total score for words[i]. We perform this calculation for each string in the words array and return the results as an array answer such that answer[i] will contain the cumulated score corresponding to words[i].

Examples

Example 1

Input:

words = ["abc","ab","bc","b"]

Output:

[5,4,3,2]

Explanation:

The answer for each string is the following:
- "abc" has 3 prefixes: "a", "ab", and "abc".
- There are 2 strings with the prefix "a", 2 strings with the prefix "ab", and 1 string with the prefix "abc".
The total is answer[0] = 2 + 2 + 1 = 5.
- "ab" has 2 prefixes: "a" and "ab".
- There are 2 strings with the prefix "a", and 2 strings with the prefix "ab".
The total is answer[1] = 2 + 2 = 4.
- "bc" has 2 prefixes: "b" and "bc".
- There are 2 strings with the prefix "b", and 1 string with the prefix "bc".
The total is answer[2] = 2 + 1 = 3.
- "b" has 1 prefix: "b".
- There are 2 strings with the prefix "b".
The total is answer[3] = 2.

Example 2

Input:

words = ["abcd"]

Output:

[4]

Explanation:

"abcd" has 4 prefixes: "a", "ab", "abc", and "abcd".
Each prefix has a score of one, so the total is answer[0] = 1 + 1 + 1 + 1 = 4.

Constraints

  • 1 <= words.length <= 1000
  • 1 <= words[i].length <= 1000
  • words[i] consists of lowercase English letters.

Approach and Intuition

The challenge lies in efficiently computing the score for each string based on its prefixes, given the constraints where the length of both the array and longest string can be substantial. Here's a step-by-step breakdown of the approach:

  1. Prepare for Prefix Lookups: Use a trie (prefix tree), which is an ordered tree structure used to store a dynamic set of strings where keys are usually strings. The trie can be used to quickly look up how many strings contain a particular prefix which can be built as we iterate through the words.

  2. Build the Trie: For each word in words, insert each non-empty prefix into the trie. During insertion, maintain a count at each node (or prefix end) of how many times a word with that prefix has been encountered in the words array.

  3. Calculate Scores Using the Trie:

    • For each word in words, examine each of its non-empty prefixes.
    • Using the trie, count the number of strings that have this prefix.
    • Sum these counts to get the total score for the word.
  4. Construct the Result: For each word, after calculating the total score considering all its prefixes, store this score in the corresponding position in an answer array.

In summary, the use of a trie makes it easier and faster to find the count of strings starting with a specific prefix, helping to efficiently compute our desired result for each word. This approach minimizes the otherwise potential costly operation of comparing all prefixes of a given string with all other strings, ensuring a solution that's well within acceptable time limits for the input constraints.

Solutions

  • C++
cpp
struct TrieNode {
    TrieNode* children[26] = {};
    int count = 0;
};
    
class Trie {
    TrieNode root;
    
public:
    void addWord(string word) {
        TrieNode* node = &root;
        for (char ch : word) {
            if (!node->children[ch - 'a']) {
                node->children[ch - 'a'] = new TrieNode();
            }
            node->children[ch - 'a']->count++;
            node = node->children[ch - 'a'];
        }
    }
    
    int calculatePrefix(string prefix) {
        TrieNode* node = &root;
        int total = 0;
        for (char ch : prefix) {
            if (node->children[ch - 'a']) {
                total += node->children[ch - 'a']->count;
                node = node->children[ch - 'a'];
            } else {
                break;
            }
        }
        return total;
    }
    
    vector<int> prefixScores(vector<string>& words) {
        int sizeOfWords = words.size();
        for (int i = 0; i < sizeOfWords; i++) {
            addWord(words[i]);
        }
        vector<int> results(sizeOfWords, 0);
        for (int i = 0; i < sizeOfWords; i++) {
            results[i] = calculatePrefix(words[i]);
        }
        return results;
    }
};

Summing the prefix scores of strings involves compiling scores based on the frequency of each prefix appearing across the list. This C++ solution uses a Trie data structure to facilitate an efficient score calculation as detailed below:

  • A TrieNode struct is defined with pointers to children nodes and a count to track the frequency of prefix appearances.

  • The Trie class encapsulates trie operations:

    • addWord incorporates a word into the trie. As each character of the word is processed, it updates or creates child nodes and increments occurrence counts.
    • calculatePrefix computes the score for a single word by summing the counts of nodes that correspond to each character in the word's prefix.
    • prefixScores is the main function that:
      1. Adds each word from the input list to the trie.
      2. Initializes a results vector to store scores.
      3. Computes the prefix score for each word using calculatePrefix and stores the result.
  • This implementation systematically addresses the needs of calculating prefix scores through direct trie manipulations, thus offering an optimized solution for the problem statement. This method prevents repetitive calculations and optimizes string processing significantly.

  • Java
java
class TrieNode {
    TrieNode[] children = new TrieNode[26];
    int count = 0;
}
    
class TrieSolution {
    TrieNode root = new TrieNode();
    
    void addWord(String word) {
        TrieNode node = root;
        for (char character : word.toCharArray()) {
            if (node.children[character - 'a'] == null) {
                node.children[character - 'a'] = new TrieNode();
            }
            node.children[character - 'a'].count++;
            node = node.children[character - 'a'];
        }
    }
    
    int countPrefixes(String prefix) {
        TrieNode node = root;
        int total = 0;
        for (char character : prefix.toCharArray()) {
            if (node.children[character - 'a'] == null) {
                return 0;
            }
            total += node.children[character - 'a'].count;
            node = node.children[character - 'a'];
        }
        return total;
    }
    
    public int[] computePrefixScores(String[] words) {
        int wordCount = words.length;
        for (int i = 0; i < wordCount; i++) {
            addWord(words[i]);
        }
        int[] scores = new int[wordCount];
        for (int i = 0; i < wordCount; i++) {
            scores[i] = countPrefixes(words[i]);
        }
        return scores;
    }
}

The provided Java solution models a problem where you compute the prefix scores for a set of strings by leveraging a Trie data structure. The method involves creating a Trie where each node keeps a count of how many times a prefix appears. The solution consists of the following components:

  1. TrieNode Class: Contains an array of TrieNode children and a count variable. The children array corresponds to 26 letters of the alphabet, and count keeps track of the number of times a particular word or prefix has been added to the Trie.

  2. TrieSolution Class: Contains the root of Trie which initializes a new TrieNode.

    1. addWord Method: Adds a word to the Trie. It iterates through each character of the word, following or creating the necessary childs and increments the count at each node.

    2. countPrefixes Method: Retrieves the score for a single prefix by traversing through the Trie and summing the counts found at each node associated with the prefix.

    3. computePrefixScores Method: Computes the prefix scores for an array of words. It adds each word to the Trie and then computes their scores using the countPrefixes method. The scores are stored and returned in an array.

This solution efficiently computes prefix scores by first constructing the Trie with all the words and then querying the Trie for each word's prefix score, leveraging the Trie structure's ability to rapidly access node counts. This method ensures that the computation is quicker, especially beneficial when dealing with a large number of words and longer strings.

  • Python
python
class TrieNode:
    def __init__(self):
        self.children = [None] * 26
        self.prefix_count = 0
    
class TriePrefixSum:
    def __init__(self):
        self.head = TrieNode()
    
    def add_word(self, word):
        node = self.head
        for char in word:
            index = ord(char) - ord('a')
            if node.children[index] is None:
                node.children[index] = TrieNode()
            node.children[index].prefix_count += 1
            node = node.children[index]
    
    def get_prefix_sum(self, prefix):
        node = self.head
        total = 0
        for char in prefix:
            index = ord(char) - ord('a')
            if node.children[index] is None:
                break
            total += node.children[index].prefix_count
            node = node.children[index]
        return total
    
    def calculatePrefixScores(self, strings):
        for word in strings:
            self.add_word(word)
        result = [self.get_prefix_sum(word) for word in strings]
        return result

Summing up prefix scores for a list of strings can effectively be tackled using a Trie data structure, which allows efficient storage and querying of string prefixes. The provided solution implements a Trie structure in Python to compute prefix scores for a set of strings.

  • Analyze the code. It defines a TrieNode class which includes:

    • A children list of size 26 to accommodate each letter of the alphabet.
    • A prefix_count to count the occurrences of each prefix.
  • The TriePrefixSum class manages the Trie operations:

    • The add_word method inserts a word into the Trie, updating the prefix_count for each character in the word.
    • The get_prefix_sum method calculates the sum of prefix scores by traversing the Trie according to the characters in the given prefix and summing up the prefix_count.
    • The calculatePrefixScores method utilizes both add_word and get_prefix_sum to compute the prefix sum for each string in a list.

Follow these steps to compute prefix scores:

  1. Create an instance of the TriePrefixSum.
  2. Add each word in the list to the Trie using add_word.
  3. Compute the prefix sum for each word using get_prefix_sum and store each result.

The final result is a list of prefix sums for all strings in the input list. This method is particularly efficient for large datasets, as it leverages the Trie structure to minimize computational overhead associated with prefix evaluation. The implementation makes it suitable for problems requiring repetitive prefix sum computations such as autocompletion scores or word frequency analysis in text processing applications.

Comments

No comments yet.