139. Word Break - Solutions and Explanation

Problem Statement

In this task, we are provided with two main inputs: a string s and a list of strings called wordDict. The objective is to determine whether the string s can be completely segmented into a sequence of one or more words that exactly match the words found in wordDict.

Key rules:

Words from the wordDict can be used multiple times.
The order of segments in s must exactly form the original string.
The complete string s must be used up by concatenating whole words from wordDict.

Return true if such segmentation is possible and false otherwise.

Examples

Example 1

Input:

s = "vultrcode", wordDict = ["vultr","code"]

Output:

true

Explanation:

Return true because "vultrcode" can be segmented as "vultr code".

Example 2

Input:

s = "applepenapple", wordDict = ["apple","pen"]

Output:

true

Explanation:

Return true because "applepenapple" can be segmented as "apple pen apple".
Note that you are allowed to reuse a dictionary word.

Example 3

Input:

s = "catsandog", wordDict = ["cats","dog","sand","and","cat"]

Output:

false

Constraints

1 <= s.length <= 300
1 <= wordDict.length <= 1000
1 <= wordDict[i].length <= 20
s and wordDict[i] consist of only lowercase English letters
All the strings of wordDict are unique

Approach and Intuition

This problem is a classic case for Dynamic Programming. We'll define a boolean DP array dp where:

dp[i] is true if the substring s[0:i] can be formed by concatenating words in wordDict.
dp[0] is initialized as true because an empty string is trivially valid.

Implementation Steps:

Initialize dp = [False] * (len(s) + 1), and set dp[0] = True.
For every index i from 1 to len(s):
- For each word in wordDict, check:
  - If s[i-len(word):i] == word and dp[i-len(word)] == True, then set dp[i] = True.
Return dp[len(s)].

This solution runs in O(n × k) time, where n = len(s) and k is the total number of characters across all words in wordDict.

Example Walkthrough

For s = "vultrcode" and wordDict = ["vultr", "code"]:

dp[5] = True because "vultr" ∈ wordDict
Then, since dp[5] = True and s[5:9] = "code" ∈ wordDict → dp[9] = True
Final result → dp[9] = True

For s = "catsandog" and wordDict = ["cats", "dog", "sand", "and", "cat"]:

Multiple segments are valid up to "cats" or "cat", "sand", and "and"
However, "og" at the end does not match any wordDict entry after valid prefix segments
Final result → False

This strategy ensures optimal performance while respecting the constraints.

Solutions

C++
Java
C
JavaScript
Python

                            cpp
                            
                        
class Solution {
public:
    bool canSegmentString(string s, vector<string>& dict) {
        unordered_set<string> wordSet(dict.begin(), dict.end());
        vector<bool> canBreak(s.length() + 1, false);
        canBreak[0] = true;
            
        for (int end = 1; end <= s.length(); end++) {
            for (int start = 0; start < end; start++) {
                if (canBreak[start] && wordSet.find(s.substr(start, end - start)) != wordSet.end()) {
                    canBreak[end] = true;
                    break;
                }
            }
        }
            
        return canBreak[s.length()];
    }
};

This implementation provides a method canSegmentString that determines if a given string s can be segmented into elements that are present in a dictionary dict using dynamic programming. The dictionary is initially transformed into an unordered set wordSet for efficient look-up operations.

The method uses a vector<bool> named canBreak which keeps track of the possibility of segmenting the string up to each index. The size of canBreak is one more than the string length to handle segment checks efficiently.
canBreak[0] is initialized to true representing that a string of zero length is always segmentable.
The outer loop iterates over the string with an index end from 1 to the string's length inclusive, determining if the substring ending at each position can be segmented.
Within this outer loop, another inner loop iterates with a variable start from 0 up to end. This loop checks if the substring from start to end exists in the wordSet.
If both the substring can be found in the wordSet and the string up to start can be segmented (canBreak[start] is true), then the string up to end can also be segmented (canBreak[end] is set to true).
The loop breaks once a valid segmentation is found for a specific end.

Finally, the method returns the value of canBreak[s.length()], indicating whether the entire string can be segmented based on the provided dictionary. This method efficiently checks segmentations, avoiding unnecessary computation through dynamic programming and leveraging efficient data structures.

                            java
                            
                        
class Solution {
    public boolean canSegmentString(String str, List<String> dictionary) {
        int length = str.length();
        Set<String> dictSet = new HashSet<>(dictionary);
        boolean[] canSegment = new boolean[length + 1];
        canSegment[0] = true;
    
        for (int end = 1; end <= length; end++) {
            for (int start = 0; start < end; start++) {
                if (canSegment[start] && dictSet.contains(str.substring(start, end))) {
                    canSegment[end] = true;
                    break;
                }
            }
        }
    
        return canSegment[length];
    }
}

The Word Break problem determines if a string can be segmented into space-separated words found in a given dictionary. The provided Java solution employs dynamic programming to solve this efficiently.

Start by initializing a boolean array canSegment where canSegment[i] will be true if the first i characters of the string can be segmented using the dictionary.
The algorithm constructs a Set from the dictionary to allow for O(1) average-time complexity checks for word existence.
Initially, set canSegment[0] to true, signifying that a zero-length string can be segmented trivially (i.e., do nothing).
Use a nested loop where the outer loop goes from 1 to the length of the string. For each position end, the inner loop checks all possible start positions from 0 to end.
For each pair (start, end), if the substring str.substring(start, end) is in the dictionary and canSegment[start] is true, then set canSegment[end] to true and break out of the inner loop.
The value at canSegment[str.length()] gives the final answer; it will be true if the whole string can be segmented according to the dictionary.

By the end of the use of this algorithm, efficiently determine if the input string can be split into dictionary words without redundant checks, leveraging the power of dynamic programming. This approach ensures that each substring is processed only once and only if needed, providing significant performance benefits for larger strings or dictionaries.

                            c
                            
                        
#define BUFFER_SIZE 1000
#define DICT_CAPACITY 1000
struct hash_entry {
    char key[BUFFER_SIZE + 1];
    UT_hash_handle hh;
};
bool canSegmentString(char* s, char** dict, int dictSize) {
    struct hash_entry* hashTable = NULL;
    for (int idx = 0; idx < dictSize; idx++) {
        struct hash_entry* newEntry = malloc(sizeof(struct hash_entry));
        strcpy(newEntry->key, dict[idx]);
        HASH_ADD_STR(hashTable, key, newEntry);
    }
    bool dp[BUFFER_SIZE + 1];
    memset(dp, false, sizeof(dp));
    dp[0] = true;
    int stringLength = strlen(s);
    for (int i = 1; i <= stringLength; i++) {
        for (int j = 0; j < i; j++) {
            if (dp[j]) {
                struct hash_entry* lookupEntry;
                char extractedWord[BUFFER_SIZE + 1];
                strncpy(extractedWord, &s[j], i - j);
                extractedWord[i - j] = '\0';  // Ensure null termination
                HASH_FIND_STR(hashTable, extractedWord, lookupEntry);
                if (lookupEntry) {
                    dp[i] = true;
                    break;
                }
            }
        }
    }
    return dp[stringLength];
}

The given C code solves the "Word Break" problem by determining if a string can be segmented into space-separated dictionary words. The implementation utilizes a dynamic programming approach along with a hash table for efficient word look-up. Below is an overview of the code structure:

Define Constants and Structures: Constants like BUFFER_SIZE define the maximum length of words and DICT_CAPACITY for dictionary capacity. A struct hash_entry is defined to use with the hash table for storing dictionary words.
Initialization of Hash Table: The dictionary words are added into a hash table, hashTable, using the uthash library. This allows for efficient look-up times during the segmentation process.
Dynamic Programming Array: The boolean array dp is utilized to store whether a substring can be segmented up till each index. It starts with dp[0] initialized to true, signifying that a zero-length string is trivially segmentable.
Segmentation Logic: The function iterates over the string using two nested loops, checking for each substring if it ends at a dictionary word which starts at a previously segmentable index. If it matches, the current position in dp is set to true.
Result: The value at dp[stringLength] determines if the whole string can be segmented, which is returned as the result of the canSegmentString function.

The solution is efficient for scenarios where dictionary look-up operations are a bottleneck, as the hash table significantly reduces the look-up time. The dynamic programming component ensures that no substring is redundantly processed, making it efficient in terms of time complexity as well. This implementation provides a robust method to solve segmentation problems in strings.

                            js
                            
                        
var canSegment = function (str, dictionary) {
    let length = str.length;
    let wordSet = new Set(dictionary);
    let dpTable = new Array(length + 1).fill(false);
    dpTable[0] = true;
    for (let i = 1; i <= length; i++) {
        for (let j = 0; j < i; j++) {
            if (dpTable[j] && wordSet.has(str.substring(j, i))) {
                dpTable[i] = true;
                break;
            }
        }
    }
    return dpTable[length];
};

This JavaScript solution addresses the problem of determining if a string can be segmented into space-separated words that are found in a given dictionary.

Start by creating a set from the dictionary to facilitate fast lookups.
Use dynamic programming to solve the problem. Create a boolean array dpTable initialized with false values and set dpTable[0] to true to represent an empty substring.
Iterate through the string using two nested loops:
- Outer loop goes from position 1 to the length of the string.
- Inner loop checks each substring from the start of the string up to the current position in the outer loop.
For each position, check if there exists any breakpoint j within the current substring such that the substring from j to i exists in the dictionary and the position j in dpTable is true. If found, set dpTable[i] to true, breaking the inner loop early as no further checks are needed.
Return the value of dpTable[length] which indicates whether the entire string can be segmented into dictionary words.

By implementing this solution, effectively segment strings using dictionary words through a combine use of dynamic programming and hash set lookups. This ensures optimal performance while checking every potential word segment against the dictionary quickly.

                            python
                            
                        
class Solution:
    def canSegmentString(self, text: str, dictionary: List[str]) -> bool:
        length = len(text)
        word_set = set(dictionary)
        can_segment = [False] * (length + 1)
        can_segment[0] = True
            
        for i in range(1, length + 1):
            for j in range(i):
                if can_segment[j] and text[j:i] in word_set:
                    can_segment[i] = True
                    break
    
        return can_segment[length]

The given Python solution is designed to solve the "Word Break" problem. The goal is to determine if a given string (text) can be segmented into one or more words that exist in a supplied dictionary.

The solution employs a dynamic programming approach where:
- length stores the size of the text.
- word_set is a set created from the input dictionary for faster look-ups.
- can_segment is a Boolean list, initialized to accommodate each character of the text plus an empty character scenario. The first element is set to True, representing an empty string.
The function iterates through the text by:
- Dividing text into all possible substrings.
- Checking each substring against the word_set.
- For each valid substring (one found in word_set), it updates the can_segment list to record the positions where valid breaks occur.
It continues segmenting until it either finds a valid sequence of words covering the entire string or exhausts all possibilities.

At the end of execution, the result reflects in can_segment[length], which indicates if the entire text can be segmented based on the dictionary provided. This approach ensures that you capture whether a holistic solution to the provided text exists.

Comments

No comments yet.

Word Break

Problem Statement

Examples

Example 1

Example 2

Example 3

Constraints

Approach and Intuition

Implementation Steps:

Example Walkthrough

Solutions

Comments

Products

Features

Solutions

Marketplace

Resources

Company

Tech Talks

Vultr Blogs