Word Subsets

Updated on 08 July, 2025
Word Subsets header image

Problem Statement

In this challenge, you are working with two arrays of strings, named words1 and words2. The objective is to determine which strings in words1 are 'universal'. A string is defined as 'universal' if every string in words2 is a subset of this string.

To clarify, a string b is a subset of another string a if every character in b appears in a with at least the same frequency as it does in b. For instance, 'wrr' is found within 'warrior' when considering character occurrence, but not within 'world' because it lacks sufficient 'r' characters.

The task is to return an array of all such universal strings from the words1 array. The output does not require a specific order.


Examples

Example 1

Input:

words1 = ["amazon","apple","facebook","google","vultrcode"], words2 = ["e","o"]

Output:

["facebook","google","vultrcode"]

Example 2

Input:

words1 = ["amazon","apple","facebook","google","vultrcode"], words2 = ["lc","eo"]

Output:

["vultrcode"]

Example 3

Input:

words1 = ["acaac","cccbb","aacbb","caacc","bcbbb"], words2 = ["c","cc","b"]

Output:

["cccbb"]

Constraints

  • 1 <= words1.length, words2.length <= 10⁴
  • 1 <= words1[i].length, words2[i].length <= 10
  • words1[i] and words2[i] consist only of lowercase English letters.
  • All the strings of words1 are unique.

Approach and Intuition

The problem revolves around efficiently checking the subset relationship between each string in words1 against all strings in words2. Here's a structured approach toward solving it:

  1. Create Frequency Maps for words2: For each word in words2, build a frequency dictionary that tracks how many times each character appears.

  2. Build a Maximum Frequency Requirement: Combine all frequency dictionaries into one that stores the maximum frequency required for each character across all words2 entries.

  3. Evaluate Each words1 Entry: For each word in words1, build its character frequency and verify that it meets or exceeds the frequency of every character in the maximum frequency dictionary.

If the word satisfies all frequency requirements, it is universal and added to the result.

This optimization avoids comparing each pair individually and ensures better performance with large inputs. The character space is limited (only lowercase letters), so it's feasible to process using fixed-size arrays or dictionaries.

Solutions

  • C++
cpp
class Solution {
public:
    vector<string> subsetWords(vector<string>& list1, vector<string>& list2) {
        vector<int> maxFreq = getFreq("");
        for (const string& word : list2) {
            vector<int> freq = getFreq(word);
            for (int i = 0; i < 26; ++i) {
                maxFreq[i] = max(maxFreq[i], freq[i]);
            }
        }
    
        vector<string> result;
        for (const string& word : list1) {
            vector<int> freq = getFreq(word);
            bool qualifies = true;
            for (int i = 0; i < 26; ++i) {
                if (freq[i] < maxFreq[i]) {
                    qualifies = false;
                    break;
                }
            }
            if (qualifies) {
                result.push_back(word);
            }
        }
    
        return result;
    }
    
private:
    vector<int> getFreq(const string& str) {
        vector<int> freq(26, 0);
        for (char ch : str) {
            freq[ch - 'a']++;
        }
        return freq;
    }
};

This solution presents a method to find 'word subsets'. A word from a list list1 is considered a subset when it contains all the characters required by any word in another list list2 with at least the frequency that appears in the most demanding word in list2.

The solution includes two primary components: accumulating maximum frequency requirements for each letter from list2 and then checking list1 words against this criteria. The process unfolds as follows:

  1. Define maxFreq as a vector of 26 integers initialized to zero, representing the maximum frequencies for each alphabet letter from 'a' to 'z'.
  2. Iterate over each word in list2 to update maxFreq. For each word, calculate frequency of each character and update maxFreq so it holds the highest frequency required for each character across all words in list2.
  3. Initialize a result vector to store words from list1 that meet the subset requirements.
  4. For each word in list1, check if it qualifies as a subset by ensuring it meets or exceeds the frequencies specified in maxFreq.
  5. Add qualifying words to the result list.
  6. Finally, return the result, which contains all words from list1 that are subsets as per the aforementioned criteria.

The utility function getFreq calculates the frequency of each character in a given word and returns a frequency vector. This helps in both setting up the required frequencies from list2 and checking each word in list1 against these requirements.

By following this approach, you efficiently filter out non-subset words and achieve the desired result using the described method. This solution is especially effective when list2 has multiple words that collectively define the character subset criteria.

  • Java
java
class Solution {
    
    public List<String> universalStringFinder(String[] A, String[] B) {
        int[] globalMax = evaluate("");
        for (String bWord : B) {
            int[] tempCount = evaluate(bWord);
            for (int i = 0; i < 26; ++i) globalMax[i] = Math.max(globalMax[i], tempCount[i]);
        }
    
        List<String> result = new ArrayList();
        outer: for (String aWord : A) {
            int[] aWordCount = evaluate(aWord);
            for (int i = 0; i < 26; ++i) if (aWordCount[i] < globalMax[i]) continue outer;
            result.add(aWord);
        }
    
        return result;
    }
    
    public int[] evaluate(String str) {
        int[] res = new int[26];
        for (char c : str.toCharArray()) res[c - 'a']++;
        return res;
    }
}

The "Word Subsets" solution provided in Java involves identifying all strings in array A that are "universal" for array B. A string is considered universal if every character frequency requirement of each string in B is at least met.

The approach follows these steps:

  1. Calculate the maximum frequency of each character among all strings in B. This uses the evaluate method to count character occurrences for each string in B, and updates a global frequency array with the maximum counts required for universality.

  2. For each string in A, count the character frequencies using the same evaluate method.

  3. Compare the frequency array of the current string from A with the global frequency array. If A's string meets or exceeds the frequency counts across every character that appears in B's string, add the string from A to the result list.

  4. Return the list of universal strings from A that meet all character frequency requirements from B.

The implementation involves a careful use of frequency arrays and ensures that each string in A is only added if it fulfills the universal criteria set by all strings in B. This efficient character comparison through arrays allows for faster checks compared to string manipulation methods, thus optimizing performance for large datasets.

  • Python
python
class Solution:
    def subsetChecker(self, A: List[str], B: List[str]) -> List[str]:
        def getFreqCounts(word):
            freq = [0] * 26
            for char in word:
                freq[ord(char) - ord('a')] += 1
            return freq
    
        maxFreq = [0] * 26
        for wordB in B:
            currentFreq = getFreqCounts(wordB)
            for idx, count in enumerate(currentFreq):
                maxFreq[idx] = max(maxFreq[idx], count)
        result = []
        for wordA in A:
            if all(fA >= fB for fA, fB in zip(getFreqCounts(wordA), maxFreq)):
                result.append(wordA)
        return result

In the Python3 solution for the "Word Subsets" problem, you focus on determining which strings in a list A are universal with respect to list B. A universal string from A contains all the characters in each string of B with at least the maximum frequency that appears in B.

The approach uses a helper function getFreqCounts to count and return the frequency of each character (limited to the 26 lowercase alphabets) in a given word. This function maps characters to indices in an array from 0 (for 'a') to 25 (for 'z'), incrementing the respective index for every occurrence of a character in the word.

In the main function subsetChecker:

  • First, determine the maximum character frequencies needed using the maxFreq list by iterating over each word in B. Update maxFreq for every character to ensure it reflects the highest demand across all words in B.
  • Then, iterate through each word in A and use a generator expression to check if all the required character frequencies in maxFreq are met. If a word in A meets or exceeds these frequencies, it's considered universal and added to the results list.

At the end, the function returns the list of universal words, satisfying the conditions set forth by list B. This method efficiently calculates the result using the frequency comparison, ensuring each word in the result list meets the universal criteria.

Comments

No comments yet.