
Problem Statement
In this challenge, you are working with two arrays of strings, named words1
and words2
. The objective is to determine which strings in words1
are 'universal'. A string is defined as 'universal' if every string in words2
is a subset of this string.
To clarify, a string b
is a subset of another string a
if every character in b
appears in a
with at least the same frequency as it does in b
. For instance, 'wrr'
is found within 'warrior'
when considering character occurrence, but not within 'world'
because it lacks sufficient 'r'
characters.
The task is to return an array of all such universal strings from the words1
array. The output does not require a specific order.
Examples
Example 1
Input:
words1 = ["amazon","apple","facebook","google","vultrcode"], words2 = ["e","o"]
Output:
["facebook","google","vultrcode"]
Example 2
Input:
words1 = ["amazon","apple","facebook","google","vultrcode"], words2 = ["lc","eo"]
Output:
["vultrcode"]
Example 3
Input:
words1 = ["acaac","cccbb","aacbb","caacc","bcbbb"], words2 = ["c","cc","b"]
Output:
["cccbb"]
Constraints
1 <= words1.length, words2.length <= 10⁴
1 <= words1[i].length, words2[i].length <= 10
words1[i]
andwords2[i]
consist only of lowercase English letters.- All the strings of
words1
are unique.
Approach and Intuition
The problem revolves around efficiently checking the subset relationship between each string in words1
against all strings in words2
. Here's a structured approach toward solving it:
Create Frequency Maps for
words2
: For each word inwords2
, build a frequency dictionary that tracks how many times each character appears.Build a Maximum Frequency Requirement: Combine all frequency dictionaries into one that stores the maximum frequency required for each character across all
words2
entries.Evaluate Each
words1
Entry: For each word inwords1
, build its character frequency and verify that it meets or exceeds the frequency of every character in the maximum frequency dictionary.
If the word satisfies all frequency requirements, it is universal and added to the result.
This optimization avoids comparing each pair individually and ensures better performance with large inputs. The character space is limited (only lowercase letters), so it's feasible to process using fixed-size arrays or dictionaries.
Solutions
- C++
class Solution {
public:
vector<string> subsetWords(vector<string>& list1, vector<string>& list2) {
vector<int> maxFreq = getFreq("");
for (const string& word : list2) {
vector<int> freq = getFreq(word);
for (int i = 0; i < 26; ++i) {
maxFreq[i] = max(maxFreq[i], freq[i]);
}
}
vector<string> result;
for (const string& word : list1) {
vector<int> freq = getFreq(word);
bool qualifies = true;
for (int i = 0; i < 26; ++i) {
if (freq[i] < maxFreq[i]) {
qualifies = false;
break;
}
}
if (qualifies) {
result.push_back(word);
}
}
return result;
}
private:
vector<int> getFreq(const string& str) {
vector<int> freq(26, 0);
for (char ch : str) {
freq[ch - 'a']++;
}
return freq;
}
};
This solution presents a method to find 'word subsets'. A word from a list list1
is considered a subset when it contains all the characters required by any word in another list list2
with at least the frequency that appears in the most demanding word in list2
.
The solution includes two primary components: accumulating maximum frequency requirements for each letter from list2
and then checking list1
words against this criteria. The process unfolds as follows:
- Define
maxFreq
as a vector of 26 integers initialized to zero, representing the maximum frequencies for each alphabet letter from 'a' to 'z'. - Iterate over each word in
list2
to updatemaxFreq
. For each word, calculate frequency of each character and updatemaxFreq
so it holds the highest frequency required for each character across all words inlist2
. - Initialize a
result
vector to store words fromlist1
that meet the subset requirements. - For each word in
list1
, check if it qualifies as a subset by ensuring it meets or exceeds the frequencies specified inmaxFreq
. - Add qualifying words to the
result
list. - Finally, return the
result
, which contains all words fromlist1
that are subsets as per the aforementioned criteria.
The utility function getFreq
calculates the frequency of each character in a given word and returns a frequency vector. This helps in both setting up the required frequencies from list2
and checking each word in list1
against these requirements.
By following this approach, you efficiently filter out non-subset words and achieve the desired result using the described method. This solution is especially effective when list2
has multiple words that collectively define the character subset criteria.
- Java
class Solution {
public List<String> universalStringFinder(String[] A, String[] B) {
int[] globalMax = evaluate("");
for (String bWord : B) {
int[] tempCount = evaluate(bWord);
for (int i = 0; i < 26; ++i) globalMax[i] = Math.max(globalMax[i], tempCount[i]);
}
List<String> result = new ArrayList();
outer: for (String aWord : A) {
int[] aWordCount = evaluate(aWord);
for (int i = 0; i < 26; ++i) if (aWordCount[i] < globalMax[i]) continue outer;
result.add(aWord);
}
return result;
}
public int[] evaluate(String str) {
int[] res = new int[26];
for (char c : str.toCharArray()) res[c - 'a']++;
return res;
}
}
The "Word Subsets" solution provided in Java involves identifying all strings in array A that are "universal" for array B. A string is considered universal if every character frequency requirement of each string in B is at least met.
The approach follows these steps:
Calculate the maximum frequency of each character among all strings in B. This uses the
evaluate
method to count character occurrences for each string in B, and updates a global frequency array with the maximum counts required for universality.For each string in A, count the character frequencies using the same
evaluate
method.Compare the frequency array of the current string from A with the global frequency array. If A's string meets or exceeds the frequency counts across every character that appears in B's string, add the string from A to the result list.
Return the list of universal strings from A that meet all character frequency requirements from B.
The implementation involves a careful use of frequency arrays and ensures that each string in A is only added if it fulfills the universal criteria set by all strings in B. This efficient character comparison through arrays allows for faster checks compared to string manipulation methods, thus optimizing performance for large datasets.
- Python
class Solution:
def subsetChecker(self, A: List[str], B: List[str]) -> List[str]:
def getFreqCounts(word):
freq = [0] * 26
for char in word:
freq[ord(char) - ord('a')] += 1
return freq
maxFreq = [0] * 26
for wordB in B:
currentFreq = getFreqCounts(wordB)
for idx, count in enumerate(currentFreq):
maxFreq[idx] = max(maxFreq[idx], count)
result = []
for wordA in A:
if all(fA >= fB for fA, fB in zip(getFreqCounts(wordA), maxFreq)):
result.append(wordA)
return result
In the Python3 solution for the "Word Subsets" problem, you focus on determining which strings in a list A
are universal with respect to list B
. A universal string from A
contains all the characters in each string of B
with at least the maximum frequency that appears in B
.
The approach uses a helper function getFreqCounts
to count and return the frequency of each character (limited to the 26 lowercase alphabets) in a given word. This function maps characters to indices in an array from 0 (for 'a') to 25 (for 'z'), incrementing the respective index for every occurrence of a character in the word.
In the main function subsetChecker
:
- First, determine the maximum character frequencies needed using the
maxFreq
list by iterating over each word inB
. UpdatemaxFreq
for every character to ensure it reflects the highest demand across all words inB
. - Then, iterate through each word in
A
and use a generator expression to check if all the required character frequencies inmaxFreq
are met. If a word inA
meets or exceeds these frequencies, it's considered universal and added to the results list.
At the end, the function returns the list of universal words, satisfying the conditions set forth by list B
. This method efficiently calculates the result using the frequency comparison, ensuring each word in the result list meets the universal criteria.
No comments yet.