Word Break II

Updated on 02 July, 2025
Word Break II header image

Problem Statement

The task involves taking a string s and a set of words known as wordDict. With these, your goal is to understand how different combinations of the words in wordDict can be concatenated to recreate the string s as a sentence. Each word from the dictionary can be used multiple times to achieve this. The challenge is to find all such possible sentences where the formation of the string s is solely composed of valid words from wordDict arranged in any legitimate order. The output must include all combinations of such sentences and can be returned in any order.

Examples

Example 1

Input:

s = "catsanddog", wordDict = ["cat","cats","and","sand","dog"]

Output:

["cats and dog","cat sand dog"]

Example 2

Input:

s = "pineapplepenapple", wordDict = ["apple","pen","applepen","pine","pineapple"]

Output:

["pine apple pen apple","pineapple pen apple","pine applepen apple"]

Explanation:

Note that you are allowed to reuse a dictionary word.

Example 3

Input:

s = "catsandog", wordDict = ["cats","dog","sand","and","cat"]

Output:

[]

Constraints

  • 1 <= s.length <= 20
  • 1 <= wordDict.length <= 1000
  • 1 <= wordDict[i].length <= 10
  • s and wordDict[i] consist of only lowercase English letters.
  • All the strings of wordDict are unique.
  • Input is generated in a way that the length of the answer doesn't exceed 105.

Approach and Intuition

Given the nature and constraints of the problem, here is how the solution can be effectively approached:

  1. Understanding the String and Word Dictionary: First, recognize that each character sequence in s needs to be covered by some word in wordDict.

  2. Use of Dynamic Programming: To solve this efficiently using dynamic programming, start by creating an array dp of size n+1, where n is the length of the string s. Each entry dp[i] will store a list of all possible sentences that can be formed using the substring s[0:i].

  3. Initialization: Initialize the DP table such that dp[0] contains an empty sentence (or an empty list) since the substring s[0:0] is empty and serves as our base case.

  4. Constructing the DP Table: For each position i from 1 to n:

    • Examine each word in wordDict.
    • Check if s[j:i] matches a word in wordDict for any j between 0 and i.
    • If a match is found, concatenate this 'word' phrase to each sentence form in dp[j] to create newly formed sentences for dp[i].
  5. Result Collection: At the end of the DP table construct, dp[n] will contain all possible sentences that can form the sum string s.

  • Handling words not in string: It’s possible none of the combinations of words in the wordDict matches the string s perfectly - in such cases, return an empty list.

  • Backtracking For Efficiency: Given potentially large outputs, backtracking through only those substrings and word combinations that are potential matches can greatly optimize the process.

  • Examples: Through the examples given:

    • In Example 1, s = "catsanddog" can be segmented as ["cats and dog", "cat sand dog"].
    • Example 2 extends the segmentation by allowing word reuse, shown by different separations of the substring "apple".
    • Example 3 shows an instance where no suitable segmentation is possible rendering an empty output.

By utilizing the above understanding and approach, the problem of segmenting the string s using the words from wordDict can be efficiently solved using dynamic programming.

Solutions

  • C++
  • Java
  • Python
cpp
class TrieNode {
public:
    bool isWord;
    TrieNode* childNodes[26];
    
    TrieNode() {
        isWord = false;
        for (int i = 0; i < 26; i++) {
            childNodes[i] = nullptr;
        }
    }
};
    
class WordTrie {
public:
    TrieNode* start;
    
    WordTrie() { start = new TrieNode(); }
    
    void append(string term) {
        TrieNode* current = start;
        for (char ch : term) {
            int pos = ch - 'a';
            if (!current->childNodes[pos]) {
                current->childNodes[pos] = new TrieNode();
            }
            current = current->childNodes[pos];
        }
        current->isWord = true;
    }
};
    
class Solution {
public:
    vector<string> wordBreak(string s, vector<string>& dictionary) {
        // Create and populate the Trie from given dictionary
        WordTrie wordTrie;
        for (string word : dictionary) {
            wordTrie.append(word);
        }
    
        // Dictionary to hold the solutions for subproblems
        unordered_map<int, vector<string>> memo;
    
        // Process the input string from end to start
        for (int start = s.size(); start >= 0; start--) {
            // This will collect all viable sentence formations from this point
            vector<string> sentences;
    
            // Starting node is root of Trie
            TrieNode* trackerNode = wordTrie.start;
    
            // Process each substring starting from 'start'
            for (int end = start; end < s.size(); end++) {
                char ch = s[end];
                int idx = ch - 'a';
    
                // If no node for this character, break out of current loop
                if (!trackerNode->childNodes[idx]) {
                    break;
                }
    
                // Move to the next node corresponding to the character
                trackerNode = trackerNode->childNodes[idx];
    
                // If node marks completion of a word in the Trie
                if (trackerNode->isWord) {
                    string foundWord = s.substr(start, end - start + 1);
    
                    // Directly add word as a sentence if it reaches the string's end
                    if (end == s.size() - 1) {
                        sentences.push_back(foundWord);
                    } else {
                        // Combine this word with results from memoization of the next index
                        for (string subsequent : memo[end + 1]) {
                            sentences.push_back(foundWord + " " + subsequent);
                        }
                    }
                }
            }
    
            // Save results in the memo
            memo[start] = sentences;
        }
    
        // Return all combinations starting from index 0
        return memo[0];
    }
};

The provided C++ code implements a solution to the "Word Break II" problem using a Trie and dynamic programming for optimal performance.

  • Trie and TrieNode Declaration:

    • TrieNode contains a isWord flag and 26 childNodes.
    • WordTrie manages the root node and provides a method append to insert words into the Trie.
  • Solution Class Implementation:

    1. Instantiate WordTrie and populate it with words from the dictionary.
    2. Use an unordered_map called memo to store previously computed results for subproblems.
    3. Process the input string from the end to the beginning, populating sentences with all possible sentence formations starting from each position.
    4. For each position in the string, use the Trie to identify words. If a complete word is found:
      • Directly add the word as a sentence if it spans to the end of the string.
      • If not at the end, append possible continuations from memo to form new sentences.
    5. Store results in memo for each starting index.
    6. The final result, comprising all possible sentences that can be formed from the input string using words from the dictionary, is retrieved from memo[0].

This implementation ensures efficient word checking and avoids redundant recomputation by leveraging memoization.

java
class PrefixTreeNode {
    
    boolean terminal;
    PrefixTreeNode[] edges;
    
    PrefixTreeNode() {
        terminal = false;
        edges = new PrefixTreeNode[26];
    }
}
    
class PrefixTree {
    
    PrefixTreeNode entryPoint;
    
    PrefixTree() {
        entryPoint = new PrefixTreeNode();
    }
    
    void addWord(String word) {
        PrefixTreeNode node = entryPoint;
        for (char charElement : word.toCharArray()) {
            int idx = charElement - 'a';
            if (node.edges[idx] == null) {
                node.edges[idx] = new PrefixTreeNode();
            }
            node = node.edges[idx];
        }
        node.terminal = true;
    }
}
    
class Solution {
    
    public List<String> wordBreak(String s, List<String> wordDict) {
        PrefixTree prefixTree = new PrefixTree();
        for (String word : wordDict) {
            prefixTree.addWord(word);
        }
    
        Map<Integer, List<String>> memo = new HashMap<>();
    
        for (int start = s.length(); start >= 0; start--) {
            List<String> results = new ArrayList<>();
    
            PrefixTreeNode current = prefixTree.entryPoint;
    
            for (int end = start; end < s.length(); end++) {
                char ch = s.charAt(end);
                int idx = ch - 'a';
    
                if (current.edges[idx] == null) {
                    break;
                }
    
                current = current.edges[idx];
    
                if (current.terminal) {
                    String snip = s.substring(start, end + 1);
    
                    if (end == s.length() - 1) {
                        results.add(snip);
                    } else {
                        List<String> nextParts = memo.get(end + 1);
                        for (String part : nextParts) {
                            results.add(snip + " " + part);
                        }
                    }
                }
            }
    
            memo.put(start, results);
        }
    
        return memo.getOrDefault(0, new ArrayList<>());
    }
}

The given Java code is a solution for generating all the possible sentences from a string where words are defined in a given dictionary using dynamic programming and a trie (prefix tree) for efficient word lookup.

  • PrefixTreeNode Class: Represents each node in the trie. Each node holds:

    • terminal: A boolean indicating if the node marks the end of a valid word.
    • edges: An array of children nodes representing subsequent characters.
  • PrefixTree Class: Represents the entire trie with operations to add words:

    • entryPoint: The root of the trie.
    • addWord(String word): Adds a word to the trie. Iterates through each character in the word, checks its position in the array edges, and creates a new node if it doesn't exist.
  • Solution Class:

    1. wordBreak(String s, List wordDict): Main method that breaks the string s using the words in wordDict.
    2. Initializes a PrefixTree and populates it with words from wordDict.
    3. Uses a Map<Integer, List<String>> memo to store intermediate results for substrings of s, reducing repetitive computations. The key represents the starting index, and the value is a list of possible sentences starting from that index.
    4. Iterates backward from the end of the string s preparing possible sentences.
    5. For each position, checks if there is a valid word using the trie. If a word ends at the current position:
      • If it's the end of the string s, adds it directly to results.
      • If not, concatenates it with previously computed result sentences from memo for positions that follow.
    6. Updates memo with new results for the current starting position.
    7. Finally, returns the list of sentences from the memoized result for the starting index 0.

This implementation uses a combination of memoization and trie-based search to efficiently manage and compute the potential word breaks, making it suitable for large inputs.

python
class TrieNode:
    def __init__(self):
        self.endOfWord = False
        self.branches = [None] * 26  # Alphabet node storage
    
    
class WordTrie:
    def __init__(self):
        self.rootNode = TrieNode()
    
    def addWord(self, word):
        node = self.rootNode
        for letter in word:
            pos = ord(letter) - ord("a")
            if not node.branches[pos]:
                node.branches[pos] = TrieNode()
            node = node.branches[pos]
        node.endOfWord = True
    
    
class WordBreakSolver:
    def wordBreak(self, s: str, wordDict: List[str]) -> List[str]:
        trie = WordTrie()
        for word in wordDict:
            trie.addWord(word)
    
        memo = {}
    
        for i in range(len(s), -1, -1):
            valid_phrases = []
    
            node = trie.rootNode
    
            for j in range(i, len(s)):
                char = s[j]
                idx = ord(char) - ord("a")
    
                if not node.branches[idx]:
                    break
    
                node = node.branches[idx]
    
                if node.endOfWord:
                    found_word = s[i : j + 1]
                        
                    if j == len(s) - 1:
                        valid_phrases.append(found_word)
                    else:
                        next_words = memo.get(j + 1, [])
                        for seq in next_words:
                            valid_phrases.append(found_word + " " + seq)
    
            memo[i] = valid_phrases
    
        return memo.get(0, [])

The solution for the problem "Word Break II" involves constructing a Trie of all words in the given word dictionary for efficient word lookups and then using dynamic programming to decompose a string into valid words as per the dictionary.

  • Start by defining a TrieNode class that represents each node in a Trie, holding information if a word ends at this node and the branches to other nodes corresponding to each letter of the alphabet.
  • The WordTrie class encapsulates the Trie structure with a method addWord to insert words from the dictionary into the Trie, building out the nodes and marking the end of words appropriately.
  • In the WordBreakSolver class, the method wordBreak is designed to use the Trie for breaking down the string s into valid words found in the dictionary. It initializes the Trie with words from the word dictionary.
  • Use a memoization dictionary named memo to store results of sub-problems, where each key is the starting index in string s and the value is a list of possible word combinations starting from that index.
  • Utilize nested loops where the outer loop handles the start index of substrings, and the inner loop checks for valid words in the Trie by moving through each branch as characters are matched.
  • Whenever you reach a node in the Trie that marks the end of a word, you check if you are at the end of the string s. If so, add the word to the list of valid_phrases. Otherwise, recursively append this valid word along with possible sequences of words that can follow it from the memoized results.
  • Store each list of valid_phrases combinations in the memo using the start index of the substring as the key.
  • Finally, return the memoized result of decompositions that begin at index 0 for the entire string s.

This approach efficiently breaks down the string into all possible word combinations that are valid per the dictionary, leveraging Trie structure for quick lookups and memoization to avoid redundant computations.

Comments

No comments yet.