820. Short Encoding of Words - Solutions and Explanation

Problem Statement

The challenge requires finding a compact way to encode a list of words into a single string, s, accompanied by an array of starting indices, indices, indicating where each word can be located in s. Each word should be retrievable from s using the respective index in indices, by extracting a substring from that index up to the next '#' character, which serves as a delimiter. Additionally, each word in the list corresponds to one entry in indices, and the string s must end with a '#' character. The task is to compute the shortest possible length for such an encoded string s for a given list of words.

Examples

Example 1

Input:

words = ["time", "me", "bell"]

Output:

Explanation:

A valid encoding would be s = `"time#bell#" and indices = [0, 2, 5`].
words[0] = "time", the substring of s starting from indices[0] = 0 to the next '#' is underlined in "time#bell#"
words[1] = "me", the substring of s starting from indices[1] = 2 to the next '#' is underlined in "time#bell#"
words[2] = "bell", the substring of s starting from indices[2] = 5 to the next '#' is underlined in "time#bell#"

Example 2

Input:

words = ["t"]

Output:

Explanation:

A valid encoding would be s = "t#" and indices = [0].

Constraints

1 <= words.length <= 2000
1 <= words[i].length <= 7
words[i] consists of only lowercase letters.

Approach and Intuition

The goal is to devise a reference string s that allows us to access each word in the given words array by using the starting indices outlined in indices. The key challenges here are determining:

The order of words within the reference string to minimize redundancy.
How to efficiently nest words within each other if a smaller word is a suffix of a larger word.

Given this, a practical approach can be deduced:

Suffix Checks and Ordering:
- Sort the words in a manner such that longer words are considered first, or words are sorted lexicographically in reverse. This helps in easily embedding words which are suffixes of longer words.
Building the String:
- Start constructing the string s. For each word in the sorted list, check if it can be considered as a new segment or if it is a suffix from any of the existing segments of s.
- If a word is a suffix to an existing part of s, then merely note the index. If not, append it to s followed by a #. Update indices accordingly.
Efficiency Considerations:
- Using a trie (prefix tree) could potentially improve the efficiency of checking if a word or its suffix is already represented in s.
- Alternatively, keeping a set of suffixes for quick lookup during the iteration might help.

Finally, the length of the constructed string s (including all # characters) is the answer. This method ensures minimal redundancy by taking advantage of overlapping suffixes among the words, which in many cases will dramatically reduce the length of s.

Solutions

Java

                            java
                            
                        
class Encoder {
    public int minimalEncoding(String[] terms) {
        RootNode root = new RootNode();
        Map<RootNode, Integer> map = new HashMap<>();
    
        for (int i = 0; i < terms.length; i++) {
            String term = terms[i];
            RootNode current = root;
            for (int j = term.length() - 1; j >= 0; --j)
                current = current.getNode(term.charAt(j));
            map.put(current, i);
        }
    
        int result = 0;
        for (RootNode each : map.keySet()) {
            if (each.numChildren == 0)
                result += terms[map.get(each)].length() + 1;
        }
        return result;
    }
}
    
class RootNode {
    RootNode[] childNodes;
    int numChildren;
    
    RootNode() {
        childNodes = new RootNode[26];
        numChildren = 0;
    }
    
    public RootNode getNode(char letter) {
        if (childNodes[letter - 'a'] == null) {
            childNodes[letter - 'a'] = new RootNode();
            numChildren++;
        }
        return childNodes[letter - 'a'];
    }
}

The provided Java solution tackles the problem of finding the minimal encoding length for an array of words, using a trie-like data structure. Here's a breakdown of how the solution works:

A custom class RootNode is introduced to handle trie operations. This node maintains an array of RootNode elements representing the child nodes and a counter numChildren to keep track of the number of initialized child nodes.
The Encoder class contains the minimalEncoding function that initializes a RootNode representing the root of the trie and a HashMap where keys are nodes, and their associated values are indices from the input array.
The solution iterates over each word in reverse order (from end to start), inserting each character into the trie. For each complete word, the terminal node is linked to its respective word index using the map.
After building the trie, the function then iterates over every unique terminal node in the map (only nodes without children). It computes the resultant encoding length by adding the lengths of these words plus one (for the delimiter '#') and sums them to get the total minimal encoding length.
The output of the function is an integer representing this total minimal encoding length.

This solution efficiently handles the encoding problem by creating an inverse trie and utilizing a hash map to uniquely identify and measure the nodes that represent full words, hence ranks high in terms of time and space efficiency by avoiding redundant checks and recalculations.

Comments

No comments yet.

Short Encoding of Words

Problem Statement

Examples

Example 1

Example 2

Constraints

Approach and Intuition

Solutions

Comments

Products

Features

Solutions

Marketplace

Resources

Company

Tech Talks

Vultr Blogs