
Problem Statement
In the given task, you are provided with a string s
and an array of strings words
. Each string in words
is of equal length. You are required to determine all possible starting indices in the string s
where a substring is formed by any permutation of concatenation of all strings in words
. This specific kind of substring is termed as a "concatenated string." It is essential that the concatenated substring includes all strings from words
in any order without omission or repetition.
For illustration, consider words = ["ab","cd","ef"]
; valid concatenated strings in a scenario like this would include combinations such as "abcdef"
, "abefcd"
, and so forth, where each element of words
appears exactly once in the substring. Conversely, strings like "acdbef"
would not count since the order does not represent a direct permutation concatenation of words
.
The result should be an array of integers indicating the starting indices of such valid concatenated substrings within s
.
Examples
Example 1
Input:
s = "barfoothefoobarman", words = ["foo","bar"]
Output:
[0,9]
Explanation:
The substring starting at 0 is `"barfoo"`. It is the concatenation of `["bar","foo"]` which is a permutation of `words`.
Example 2
Input:
s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output:
[]
Explanation:
There is no concatenated substring.
Example 3
Input:
s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
Output:
[6,9,12]
Explanation:
The substring starting at 6 is `"foobarthe"`. It is the concatenation of `["foo","bar","the"]`.
Constraints
1 <= s.length <= 104
1 <= words.length <= 5000
1 <= words[i].length <= 30
s
andwords[i]
consist of lowercase English letters.
Approach and Intuition
To solve this problem, consider the following steps:
Verify Input Conditions: Check if the length of 's' or 'words' are within the allowed constraints that could pragmatically allow for a solution. For instance, if
words
is empty or the combined length ofwords
exceedss
, there can't be any valid substrings.Calculate Parameters:
- Determine the length of each word in
words
(noted as all words are of the same length). - Calculate the total length of a concatenated substring which would be a multiple of the individual word length and the total number of words.
- Determine the length of each word in
Sliding Window Technique: Use a sliding window approach in combination with a hashmap to track and verify counted words:
- Slide across
s
with a window size equal to the calculated concatenated substring length. - Within each window, break the substring into segments of the word's length and verify against a hashmap which keeps a count of words needed.
- Adjust the starting point of the sliding window based on whether the current substring segment aligns with any word in
words
.
- Slide across
Matching and Collecting Indices:
- If a window's segments exactly match the data structure outlining the required word counts (each word from
words
appears exactly as expected), record the starting index. - Continue this verification for each possible window in
s
.
- If a window's segments exactly match the data structure outlining the required word counts (each word from
Return Results: Collect all valid indices into an array and return this as the solution.
This approach efficiently combines the fixed length of words
elements and the possibility of permutation to validate substring sequences in a controlled manner. Implementing this with attention to window shifts and effective checking against expected word occurrences ensures that unnecessary checks are minimized, leveraging time complexity. Additionally, the use of hashmaps aids in quick lookups and comparisons, critical for performance given the potentially high constraints.
Solutions
- C++
- Java
- C
- JavaScript
- Python
class Solution {
private:
unordered_map<string, int> frequencyMap;
int stringLength;
int wordSize;
int totalLength;
int totalWords;
void examineSubstring(int startPos, string &s, vector<int> &results) {
unordered_map<string, int> currentCount;
int validWords = 0;
bool extraWord = false;
for (int pos = startPos; pos <= stringLength - wordSize; pos += wordSize) {
string tempSub = s.substr(pos, wordSize);
if (!frequencyMap.count(tempSub)) {
// Reset the window due to a non-required word
currentCount.clear();
validWords = 0;
extraWord = false;
startPos = pos + wordSize;
} else {
// Adjust window if necessary
while (pos - startPos == totalLength || extraWord) {
string leftWord = s.substr(startPos, wordSize);
startPos += wordSize;
currentCount[leftWord]--;
if (currentCount[leftWord] >= frequencyMap[leftWord]) {
// Handle excessive instances of a word
extraWord = false;
} else {
// Handle a needed word
validWords--;
}
}
// Track appearances of the current word
currentCount[tempSub]++;
if (currentCount[tempSub] <= frequencyMap[tempSub]) {
validWords++;
} else {
// More instances than needed
extraWord = true;
}
if (validWords == totalWords && !extraWord) {
// Valid substring found
results.push_back(startPos);
}
}
}
}
public:
vector<int> findSubstring(string s, vector<string> &words) {
stringLength = s.size();
totalWords = words.size();
wordSize = words[0].size();
totalLength = wordSize * totalWords;
for (string &word : words) {
frequencyMap[word]++;
}
vector<int> results;
for (int i = 0; i < wordSize; i++) {
examineSubstring(i, s, results);
}
return results;
}
};
This C++ solution provides an efficient implementation for finding all starting indices of a substring which is the concatenation of each word from a given list of words exactly once, without any intermediate characters. This implementation involves a single-pass sliding window technique combined with hashmaps for maintaining the count of words.
- The primary idea is to compute the frequency of each word in the provided list using an
unordered_map
calledfrequencyMap
. - Define
stringLength
,wordSize
,totalLength
, andtotalWords
for efficient management of substring checks throughout the main strings
. - Use a helper function
examineSubstring
that utilizes two pointers technique. It processes the string by each word-length using a sliding window, checking every possible substring of length equal to the total combined length of the words. - In the sliding window:
- Extract a word from the main string and assess its relevance by looking up
frequencyMap
. - Adjust the current count map if an extracted word doesn't align with the required frequency or is extra. It also resets the statistical count if irrelevant words intervene.
- A vector
results
is then populated with start indices every time the accumulated valid words form a valid concatenation sequence without any extra words.
- Extract a word from the main string and assess its relevance by looking up
- Finally, for each possible starting character (up to the length of a word), apply the sliding window technique to ensure all permutations caused by the word arrangement are covered.
Performance
The solution should perform at an efficiency close to O(n), where n is the length of the string, by minimizing redundant checks and not re-examining each word more than necessary. This optimal performance is achieved through the effective use of hashmaps and the two-pointer technique.
Usage
The findSubstring
function provided can be directly used by passing the main string and the list of words. This function will return a vector containing the starting indices of the substrings in the main string that match the required concatenation pattern. This solution is particularly useful for parsing and complex string manipulation tasks involving patterns of words.
class Solution {
private HashMap<String, Integer> frequencyMap = new HashMap<>();
private int stringLength;
private int wordSize;
private int totalSize;
private int totalWords;
private void windowSearch(int start, String str, List<Integer> results) {
HashMap<String, Integer> foundWords = new HashMap<>();
int countUsed = 0;
boolean overusedWord = false;
for (int end = start; end <= stringLength - wordSize; end += wordSize) {
String word = str.substring(end, end + wordSize);
if (!frequencyMap.containsKey(word)) {
foundWords.clear();
countUsed = 0;
overusedWord = false;
start = end + wordSize;
} else {
while (end - start == totalSize || overusedWord) {
String firstWord = str.substring(start, start + wordSize);
start += wordSize;
foundWords.put(
firstWord,
foundWords.get(firstWord) - 1
);
if (foundWords.get(firstWord) < frequencyMap.get(firstWord)) {
countUsed--;
} else {
overusedWord = false;
}
}
foundWords.put(word, foundWords.getOrDefault(word, 0) + 1);
if (foundWords.get(word) <= frequencyMap.get(word)) {
countUsed++;
} else {
overusedWord = true;
}
if (countUsed == totalWords && !overusedWord) {
results.add(start);
}
}
}
}
public List<Integer> findSubstring(String s, String[] words) {
stringLength = s.length();
totalWords = words.length;
wordSize = words[0].length();
totalSize = wordSize * totalWords;
for (String word : words) {
frequencyMap.put(word, frequencyMap.getOrDefault(word, 0) + 1);
}
List<Integer> results = new ArrayList<>();
for (int i = 0; i < wordSize; i++) {
windowSearch(i, s, results);
}
return results;
}
}
The Java solution provided addresses the problem of finding substrings in s
that are the concatenation of all words given in the array words
. Each word in the array is of the same length.
The approach employs a sliding window technique along with hashmap to manage the word frequencies and check the constructed substrings effectively:
- Initialize hashmaps for the target word frequencies (
frequencyMap
) and for counting word occurrences during the search (foundWords
). - Calculate necessary metrics like length of
s
, number of words, size of each word, and the total size of the substring formed by concatenating all words. - For every valid starting position in
s
up to the length of a word (wordSize
), employwindowSearch
to find qualifying substrings.- In
windowSearch
, iterate over the string and extract each word of lengthwordSize
. - If the word is not in
frequencyMap
, reset the search parameters. If the word is found but results in an overflow (overusedWord
), adjust the starting position and update counters. - When the window of words matches both the content and frequency of the words in the
frequencyMap
without overflow, add the start index to results.
- In
The result list, compiled during each sliding window operation, contains all starting indices of substrings in s
that are a concatenation of all words in words
.
The use of double hashmap and a sliding window technique ensures efficient checking and updates within the loop, allowing the method to handle cases with words of identical size and disregard order or overlaps directly. This method delivers a list of all valid starting indices for the desired substrings in s
.
struct map_item {
char identifier[1024];
int value;
UT_hash_handle hh;
};
void modify_value(struct map_item **hashmap, char *key, int delta) {
struct map_item *item;
HASH_FIND_STR(*hashmap, key, item);
item->value += delta;
}
void insert_map(struct map_item **hashmap, char *key, int value) {
struct map_item *item;
HASH_FIND_STR(*hashmap, key, item);
if (item == NULL) {
item = (struct map_item *)malloc(sizeof(struct map_item));
strncpy(item->identifier, key, strlen(key) + 1);
HASH_ADD_STR(*hashmap, identifier, item);
}
item->value = value;
}
bool key_exists(struct map_item *hashmap, char *key) {
struct map_item *item;
HASH_FIND_STR(hashmap, key, item);
return item != NULL;
}
int get_value(struct map_item *hashmap, char *key) {
struct map_item *item;
HASH_FIND_STR(hashmap, key, item);
return item ? item->value : 0;
}
void clear_map(struct map_item **hashmap) {
struct map_item *current, *tmp;
HASH_ITER(hh, *hashmap, current, tmp) {
HASH_DEL(*hashmap, current);
free(current);
}
*hashmap = NULL;
}
int *find_substring(char *s, char **words, int wordsSize, int *returnSize) {
int n = strlen(s);
int k = wordsSize;
int word_len = strlen(words[0]);
int substr_size = word_len * k;
struct map_item *word_count = NULL;
for (int i = 0; i < wordsSize; ++i) {
insert_map(&word_count, words[i], get_value(word_count, words[i]) + 1);
}
int *results = (int *)malloc(sizeof(int) * 0);
int count = 0;
for (int i = 0; i < word_len; i++) {
struct map_item *words_found = NULL;
int count_found = 0;
bool overcount = false;
char *temp_word = (char *)malloc(word_len + 1);
temp_word[word_len] = '\0';
int start = i;
for (int end = i; end < n; end += word_len) {
if (end + word_len > n) break;
strncpy(temp_word, &s[end], word_len);
if (!key_exists(word_count, temp_word)) {
clear_map(&words_found);
count_found = 0;
overcount = false;
start = end + word_len;
} else {
char *word_at_start = (char *)malloc(word_len + 1);
word_at_start[word_len] = '\0';
while (end - start == substr_size || overcount) {
strncpy(word_at_start, &s[start], word_len);
start += word_len;
modify_value(&words_found, word_at_start, -1);
if (get_value(words_found, word_at_start) ==
get_value(word_count, word_at_start)) {
overcount = false;
} else {
count_found -= 1;
}
}
insert_map(&words_found, temp_word, get_value(words_found, temp_word) + 1);
if (get_value(words_found, temp_word) <=
get_value(word_count, temp_word)) {
count_found += 1;
} else {
overcount = true;
}
if (count_found == k && !overcount) {
results = (int *)realloc(results, sizeof(int) * (++count));
results[count - 1] = start;
}
}
}
}
*returnSize = count;
return results;
}
This solution is designed to solve the problem of finding all the starting indices of substring(s) in a given string s
that is a concatenation of each word in the words
array exactly once and without any intervening characters.
Here's a brief breakdown of how the solution operates:
Hash Map Manipulation:
- The provided code initializes custom hash maps (using uthash for operations in C) to store and track word frequencies and occurrences.
- Key functions like
insert_map
,modify_value
, andget_value
are optimized for operations such as inserting a new word, modifying the count of an existing word, and fetching the count of a word, respectively.
Sliding Window over the String
s
:- The main function
find_substring
implements a sliding window technique to traverse the strings
. A dynamic window size, based on the total length of all words inwords
multiplied by their count, is used. - The window slides over
s
in increments of the length of the words to match the pattern exactly.
- The main function
Word Matching and Frequency Checking:
- As the window slides, words in the current segment are compared against the words in the
words
list by looking them up in the hash map of expected word counts. - If a word does not exist in the hash map or exceeds the expected frequency (indicative of over counting), the sliding window resets, and word frequency counts are adjusted.
- As the window slides, words in the current segment are compared against the words in the
Index Storage and Return:
- All starting indices where a valid concatenation starts are stored in an array
results
. - Finally, this array along with the count of these starting indices is returned.
- All starting indices where a valid concatenation starts are stored in an array
This method ensures efficiency by avoiding unnecessary re-computation and directly leveraging hash map functionalities for constant-time operations. Despite being written in C, which requires manual memory management (as seen with the use of malloc
and free
), the solution handles dynamic collection sizes well, resizing where necessary, ensuring that the program remains efficient and robust during execution.
var locateSubstrings = function (str, wordList) {
if (wordList.length === 0) return [];
const wordFrequency = new Map();
const lenWord = wordList[0].length;
const targetLength = wordList.length * lenWord;
const strLength = str.length;
const matches = [];
for (let word of wordList) {
wordFrequency.set(word, (wordFrequency.get(word) || 0) + 1);
}
const checkWindow = function (start) {
const foundWords = new Map();
let countWords = 0;
let overCount = false;
for (let end = start; end <= strLength - lenWord; end += lenWord) {
const piece = str.substring(end, end + lenWord);
if (!wordFrequency.has(piece)) {
foundWords.clear();
countWords = 0;
overCount = false;
start = end + lenWord;
} else {
while (end - start === targetLength || overCount) {
const firstWord = str.substring(start, start + lenWord);
start += lenWord;
foundWords.set(
firstWord,
foundWords.get(firstWord) - 1,
);
if (
foundWords.get(firstWord) >=
wordFrequency.get(firstWord)
) {
overCount = false;
} else {
countWords--;
}
}
foundWords.set(piece, (foundWords.get(piece) || 0) + 1);
if (foundWords.get(piece) <= wordFrequency.get(piece)) {
countWords++;
} else {
overCount = true;
}
if (countWords === wordList.length && !overCount) {
matches.push(start);
}
}
}
};
for (let i = 0; i < lenWord; i++) {
checkWindow(i);
}
return matches;
};
The provided JavaScript function locateSubstrings
is designed to find all starting indices of substring(s) in a given string str
that is a concatenation of all the words in the list wordList
, where each word is used exactly once and without any intervening characters. The implementation approach is systematic and ensures that the solution is efficient and can handle variably sized inputs.
Initialize various parameters and controls including:
wordFrequency
to store the frequency of each word inwordList
.lenWord
to store the length of each word (assuming all words are the same length).targetLength
to determine the total length of the concatenated target substring formed fromwordList
.matches
to collect all valid starting indices of substrings instr
.
The core logic to locate substrings utilizes a sliding window technique:
- Iterate through the string
str
in chunks the size oflenWord
. - Check each word-sized piece against
wordFrequency
to see if it matches any word inwordList
. - Manage and adjust a window of checked words using a map
foundWords
to count occurrences of each word and ensure they do not exceed their required frequency. - Handle cases where an unwanted or excess word disrupts the current sequence by resetting and resizing the window accordingly.
- If a full series of
wordList
words are found in sequence without excess, record the starting index.
- Iterate through the string
Optimization by checking windows:
- Split the string check into as many windows as the length of the words to ensure full coverage of possible starting points in
str
.
- Split the string check into as many windows as the length of the words to ensure full coverage of possible starting points in
The function finally returns an array of positions (matches
), each indicating where a complete concatenation of wordList
begins in the string str
.
This solution efficiently handles word frequency and ensures the substring search is thorough by using a sliding window mechanism, adjusted dynamically based on content matches and mismatches found during iteration. This combines the advantages of map-based frequency counting and window sliding for effective substring searches in linear-time complexity relative to the size of str
and wordList
.
class Solution:
def findSubstringIndices(self, string: str, wordlist: List[str]) -> List[int]:
string_len = len(string)
word_count = len(wordlist)
single_word_len = len(wordlist[0])
target_len = single_word_len * word_count
word_freq = collections.Counter(wordlist)
def check_segment(start):
current_found = collections.defaultdict(int)
words_collected = 0
redundant_word = False
for end in range(start, string_len, single_word_len):
if end + single_word_len > string_len:
break
word_segment = string[end : end + single_word_len]
if word_segment not in word_freq:
current_found = collections.defaultdict(int)
words_collected = 0
redundant_word = False
start = end + single_word_len
else:
while end - start == target_len or redundant_word:
left_word = string[start : start + single_word_len]
start += single_word_len
current_found[left_word] -= 1
if current_found[left_word] == word_freq[left_word]:
redundant_word = False
else:
words_collected -= 1
current_found[word_segment] += 1
if current_found[word_segment] <= word_freq[word_segment]:
words_collected += 1
else:
redundant_word = True
if words_collected == word_count and not redundant_word:
results.append(start)
results = []
for j in range(single_word_len):
check_segment(j)
return results
In the provided Python code, the function findSubstringIndices
is designed to identify all starting indices in a given string
where a concatenation of all the words from wordlist
is found. Each word in wordlist
is of the same length.
The implementation includes these steps:
Calculate the needed lengths and count metrics: total string length (
string_len
), number of words (word_count
), length of each word (single_word_len
), combined length of all words in the list (target_len
), and a frequency counter of the words (word_freq
).The nested function
check_segment
examines sections of the string beginning at a specified start index. It uses a sliding window approach along with a hashed frequency count (current_found
) to manage and compare current word segment counts against the required (word_freq
).If any segment does not match
word_freq
, adjusts the starting point and resets any collected counts. If over-counting occurs, words are removed from the beginning of the current window until the counts align with the target frequencies.If a valid segment is discovered that matches
word_freq
and aligns with thetarget_len
, its start index is saved.This checking process is carried out for each character in the word (
single_word_len
), ensuring full coverage of the string.The list of valid start indices (
results
) is updated through each complete iteration and ultimately returned.
The algorithm's efficiency is maintained by check_segment
, handling dynamic updates and precise alignment with the target word frequencies. This solution effectively solves the problem by breaking down the complex checks and balances into manageable components and using dictionary and set operations for optimal lookup times.
No comments yet.