Concatenated Words

Updated on 14 May, 2025
Concatenated Words header image

Problem Statement

In the problem, we are provided with an array of strings named words, where each string is guaranteed to be unique. The task is to identify and return all those strings from the array that can be termed as "concatenated words". A concatenated word in this context is a string that can be formed by concatenating two or more shorter strings from the same array. These component strings can repeat and are not required to be distinct, but they must be shorter than the concatenated word itself.

Examples

Example 1

Input:

words = ["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"]

Output:

["catsdogcats","dogcatsdog","ratcatdogcat"]

Explanation:

"catsdogcats" can be concatenated by "cats", "dog" and "cats";
"dogcatsdog" can be concatenated by "dog", "cats" and "dog";
"ratcatdogcat" can be concatenated by "rat", "cat", "dog" and "cat".

Example 2

Input:

words = ["cat","dog","catdog"]

Output:

["catdog"]

Constraints

  • 1 <= words.length <= 104
  • 1 <= words[i].length <= 30
  • words[i] consists of only lowercase English letters.
  • All the strings of words are unique.
  • 1 <= sum(words[i].length) <= 105

Approach and Intuition

To solve this problem and find all the concatenated words in the given list:

  1. First, consider using a data structure to efficiently store and search for the presence of words. A set or a hashset can be ideal since it allows for average O(1) time complexity for lookups.

  2. Since every word can potentially be made up of any combination of other words in the list, we can utilize dynamic programming or recursion with memoization to determine if a word can be broken down into smaller words. A dynamic programming table can be particularly useful, where dp[i] can represent whether the substring of the word up to the i-th index can be formed by concatenating words from the list.

  3. For each word, break it down starting from the first character and check if the prefix exists in the list. If a prefix is found, recursively check if the remaining portion of the word can be split into valid shorter words from the list. For efficiency, only consider splitting words that are longer than the current prefix.

  4. Use memoization to store results of previously encountered strings during the recursion. This will prevent recalculating for substrings multiple times and reduce the time complexity dramatically.

  5. As each word in the list is processed, check if it can be segmented into at least two or more shorter words from the list. If it can, add this to the resultant list of concatenated words.

By iterating through each word and applying the above method, all concatenated words can be determined efficiently despite the potentially large input size, respecting the provided constraints.

Solutions

  • C++
  • Java
cpp
class Solution {
    bool searchSubset(const string& target, int idx, vector<bool>& seen,
                      const unordered_set<string>& dict) {
        if (idx == target.length()) return true;
        if (seen[idx]) return false;
        seen[idx] = true;
        for (int end = target.length() - (idx == 0 ? 1 : 0); end > idx; --end) {
            if (dict.count(target.substr(idx, end - idx)) &&
                searchSubset(target, end, seen, dict)) {
                return true;
            }
        }
        return false;
    }

public:
    vector<string> findConcatWords(vector<string>& wordsList) {
        unordered_set<string> dict(wordsList.begin(), wordsList.end());
        vector<string> concatenatedWords;
        for (const string& word : wordsList) {
            vector<bool> visited(word.size(), false);
            if (searchSubset(word, 0, visited, dict)) {
                concatenatedWords.push_back(word);
            }
        }
        return concatenatedWords;
    }
};

The given C++ code is a solution for finding all concatenated words in a list. The core function implements a recursive strategy with memoization to check if a word can be formed by concatenating other words from the list. Here's a breakdown of how this code works:

  • The function searchSubset checks if a substring of the target word can be found in a dictionary of words (dict). It uses a boolean vector seen to avoid reprocessing the same index in the target word, enhancing efficiency via memoization.

  • For each position in the word (idx), the function iterates backwards to try and find a valid word in the dictionary that ends at the current position. If such a word is found, it recursively checks the rest of the substring starting from the end of this word.

  • The function findConcatWords initializes a hash set (dict) with all the words from the input list for quick look-up. It then iterates over each word to see if it can be constructed from other words in the list using searchSubset.

  • If searchSubset returns true for a word, implying that the word can indeed be formed by concatenating other words from the list (excluding itself in recursion), then this word is added to the result list concatenatedWords.

Implement these functions for efficiently detecting and listing concatenated words in any given list of words. This solution makes use of both recursion for splitting the words and memoization to store already visited substrings, optimizing the process significantly.

java
class Solution {
    private boolean searchSubstring(final String str, int pos, final boolean[] mark, final Set<String> wordSet) {
        if (pos == str.length()) {
            return true;
        }
        if (mark[pos]) {
            return false;
        }
        mark[pos] = true;
        for (int i = str.length() - (pos == 0 ? 1 : 0); i > pos; --i) {
            if (wordSet.contains(str.substring(pos, i)) && searchSubstring(str, i, mark, wordSet)) {
                return true;
            }
            
        }
        return false;
    }
    
    public List<String> findAllConcatenatedWordsInADict(String[] words) {
        final Set<String> wordSet = new HashSet<>(Arrays.asList(words));
        final List<String> result = new ArrayList<>();
        for (final String str : words) {
            final int len = str.length();
            final boolean[] mark = new boolean[len];
            if (searchSubstring(str, 0, mark, wordSet)) {
                result.add(str);
            }
        }
        return result;   
    }
}

In the Java program titled "Concatenated Words", the task is to identify words in a given dictionary that are formed by concatenating other words from the same dictionary. Here's a breakdown of how the solution accomplishes this:

  • A private helper method searchSubstring is used, which leverages recursion to determine if a substring can be formed by concatenating other words in the dictionary. This method utilizes dynamic programming by marking positions in the word using a boolean array to avoid redundant checks and hence optimize the process.

  • The core of the findAllConcatenatedWordsInADict function involves:

    • Creating a HashSet to store the words for quick lookup.
    • Iterating through each word and using the searchSubstring method to determine if that word is a concatenated word. The iteration skips directly re-identifying a word as a concatenation of itself to enhance efficiency.
    • Each valid concatenated word is collected into a List<String> which is eventually returned.

The algorithm efficiently checks each word by breaking it down into possible substrings and recursing deeper only if those substrings are found in the dictionary. This prevents unnecessary computations and speeds up the process of identifying concatenated words.

Comments

No comments yet.