
Problem Statement
In the given problem, you are provided with a string text
and an array of unique strings words
. The challenge is to identify every occurrence of each word from words
within the text
as a substring and return these occurrences as pairs of indices [i, j]
. Each pair indicates that the substring starting at index i
and ending at index j
in text
matches one of the words in words
. The result should include all valid index pairs for which this condition is true, including overlapping matches. The resulting list of pairs should be returned in a sorted manner based on the starting index first, and in case of a tie on the starting index, they should be sorted by the ending index.
Examples
Example 1
Input:
text = "thestoryofvultrcodeandme", words = ["story","fvultr","vultrcode"]
Output:
[[3,7],[9,13],[10,18]]
Example 2
Input:
text = "ababa", words = ["aba","ab"]
Output:
[[0,1],[0,2],[2,3],[2,4]]
Explanation:
Notice that matches can overlap, see "aba" is found in [0,2] and [2,4].
Constraints
1 <= text.length <= 100
1 <= words.length <= 20
1 <= words[i].length <= 50
text
andwords[i]
consist of lowercase English letters.- All the strings of
words
are unique.
Approach and Intuition
Given the constraints involving relatively small string lengths (with text
being at most 100 characters), a brute-force approach is feasible. Here's a step-by-step breakdown of a direct method to solve this problem:
Initiate an empty list to collect the index pairs where matches occur.
Loop through each word in the
words
array to check for every possible occurrence intext
.For each word:
- Use a loop to go through
text
from start to finish. - Within this loop, check if the substring from the current index up to the length of the word matches the word itself.
- If a match is found, record the pair
[start_index, end_index]
(whereend_index
isstart_index
plus the length of the word minus one).
- Use a loop to go through
After collecting all the index pairs, sort them. Firstly, sort by the start index. If two pairs have the same start index, sort them by the end index.
Return the sorted list of index pairs.
This method ensures that each word is checked against all possible substrings of text
, accounting for overlapping scenarios as demonstrated in the examples. Direct string comparison allows for checking the presence of substrings efficiently, while sorting the results as required by the constraints makes the final output orderly and useful for further processing or analysis.
Solutions
- C++
- Java
- Python
struct TrieNode {
bool isEndOfWord;
map<char, TrieNode*> children;
};
struct Trie {
TrieNode* root;
Trie() { root = new TrieNode(); }
void addWord(string word) {
TrieNode* current = root;
for (char character : word) {
if (current->children.count(character) == 0) {
current->children[character] = new TrieNode();
}
current = current->children[character];
}
current->isEndOfWord = true;
}
};
class Solution {
public:
vector<vector<int>> indexPairs(string text, vector<string>& words) {
Trie trie;
for (string word : words) {
trie.addWord(word);
}
vector<vector<int>> indices;
for (int start = 0; start < text.length(); start++) {
TrieNode* node = trie.root;
for (int end = start; end < text.length(); end++) {
if (node->children.find(text[end]) == node->children.end()) {
break;
}
node = node->children[text[end]];
if (node->isEndOfWord) {
indices.push_back({start, end});
}
}
}
return indices;
}
};
The provided C++ program is designed to solve the problem of finding all index pairs in a given string, text
, that correspond to any word present in a list of words. This solution utilizes a Trie data structure to efficiently store and query words.
- Start by defining a
TrieNode
structure which includes a boolean to indicate if a node marks the end of a word and a dictionary (map) to maintain child nodes. - A
Trie
structure manages the root node and includes anaddWord
function to insert words into the Trie. - In the
indexPairs
function:- Initialize the Trie and add each word from the list of words.
- Prepare a vector to store the index pairs.
- Iterate over every possible starting index of the substring in
text
. - For each start index, iterate to identify all valid substrings that are words in the Trie.
- If at any character the Trie does not have a corresponding child, discontinue that segment checking.
- If a sequence concludes at a node that marks the end of a word, add the starting and ending indices as a pair to the result list.
- Finally, return the compiled list of index pairs.
This approach leverages the Trie's efficient look-up capabilities to check each substring of text
, making it well-suited for cases where a large dictionary of words needs to be checked against a possibly long string. The use of a Trie optimizes the search process, reducing the complexity compared to a naive search for each word against all substrings of text
.
class TrieNode {
public boolean isWordEnd;
public Map<Character, TrieNode> children = new HashMap<>();
}
class Trie {
public TrieNode rootNode;
public Trie() {
rootNode = new TrieNode();
}
public void insert(String word) {
TrieNode current = rootNode;
for (int i = 0; i < word.length(); i++) {
char ch = word.charAt(i);
if (!current.children.containsKey(ch)) {
current.children.put(ch, new TrieNode());
}
current = current.children.get(ch);
}
current.isWordEnd = true;
}
}
class Solution {
public int[][] indexPairs(String text, String[] words) {
Trie dictionary = new Trie();
for (String word : words) {
dictionary.insert(word);
}
List<int[]> pairsList = new ArrayList<>();
for (int i = 0; i < text.length(); i++) {
TrieNode node = dictionary.rootNode;
for (int j = i; j < text.length(); j++) {
if (!node.children.containsKey(text.charAt(j))) {
break;
}
node = node.children.get(text.charAt(j));
if (node.isWordEnd) {
pairsList.add(new int[] { i, j });
}
}
}
int[][] pairsArray = pairsList.toArray(new int[pairsList.size()][]);
return pairsArray;
}
}
This solution explains how to identify all index pairs for words from a given list that are sub-strings within a provided text using a Trie data structure in Java.
Begin by defining a
TrieNode
class with:- A flag
isWordEnd
to determine if the node marks the end of a word. - A map
children
to hold child nodes.
- A flag
Implement a
Trie
class with:- A root node
rootNode
. - An
insert
method that allows adding words to the Trie. This method iterates through each character of the word, checks if it exists in the current node's children, and if not, it adds it.
- A root node
In the
Solution
class, define theindexPairs
method with parameterstext
(the main string) andwords
(array of words to search within the text):- Create a Trie and insert all words from the
words
array. - Initialize a list
pairsList
to store the starting and ending indices of each found word. - Iterate through each character of the
text
and use a nested loop to check every substring starting from the current character. - For each character in the substring, check if it exists within the current Trie node's children. If not, break out of the loop.
- If a node confirms the end of a word (
isWordEnd
is true), add the current indices to thepairsList
. - Finally, convert
pairsList
to an array and return it.
- Create a Trie and insert all words from the
This structured approach ensures efficient searching of all index pairs that match words from the list using Trie, making the process faster and more memory-efficient for large sets of words or texts.
class PrefixTreeNode:
def __init__(self):
self.endOfWord = False
self.children = {}
class WordTrie:
def __init__(self):
self.root = PrefixTreeNode()
def addWord(self, word):
node = self.root
for char in word:
if char not in node.children:
node.children[char] = PrefixTreeNode()
node = node.children[char]
node.endOfWord = True
class Solution:
def indexPairs(self, text: str, words: List[str]) -> List[List[int]]:
trie = WordTrie()
for word in words:
trie.addWord(word)
pairs = []
for start in range(len(text)):
node = trie.root
for end in range(start, len(text)):
if text[end] not in node.children:
break
node = node.children[text[end]]
if node.endOfWord:
pairs.append([start, end])
return pairs
This Python solution involves constructing a trie to efficiently find all start and endpoint index pairs in a given string, text
, that match any of the words from a list, words
. A trie, or prefix tree, is a tree data structure used to efficiently store a dynamic set of strings where keys are usually strings. The solution leverages this data structure to quickly look up substrings as it iterates over the text.
First, define two classes, PrefixTreeNode
and WordTrie
:
PrefixTreeNode
tracks whether a node marks the end of a word (endOfWord
) and also contains a dictionary (children
) pointing to child nodes.WordTrie
initializes with a root node ofPrefixTreeNode
and includes anaddWord
method that adds a word to the trie, creating necessary children for each character.
In the Solution
class:
- The
indexPairs
method initializes aWordTrie
instance and adds each word from the listwords
to the trie. - It then iterates over each character in
text
, using nested loops to try all potential substring starting points. - For each starting point, the method checks subsequent characters, traversing deeper into the trie.
- If a character sequence does not follow any path in the trie (
children
dictionary does not contain the character), the inner loop terminates early. - If the end of a word is reached as indicated by the
endOfWord
flag on the trie node, the starting and ending indices of the substring are recorded. - Finally, the method returns a list of all index pairs found.
This implementation effectively finds all index pairs without repeated scanning of the text for each word, utilizing the trie for fast lookups and bounded by the length of the text. Additionally, trie construction ensures that each added word only requires time linear to its length, optimizing the overall computational complexity of the process.
No comments yet.