1065. Index Pairs of a String - Solutions and Explanation

Problem Statement

In the given problem, you are provided with a string text and an array of unique strings words. The challenge is to identify every occurrence of each word from words within the text as a substring and return these occurrences as pairs of indices [i, j]. Each pair indicates that the substring starting at index i and ending at index j in text matches one of the words in words. The result should include all valid index pairs for which this condition is true, including overlapping matches. The resulting list of pairs should be returned in a sorted manner based on the starting index first, and in case of a tie on the starting index, they should be sorted by the ending index.

Examples

Example 1

Input:

text = "thestoryofvultrcodeandme", words = ["story","fvultr","vultrcode"]

Output:

[[3,7],[9,13],[10,18]]

Example 2

Input:

text = "ababa", words = ["aba","ab"]

Output:

[[0,1],[0,2],[2,3],[2,4]]

Explanation:

Notice that matches can overlap, see "aba" is found in [0,2] and [2,4].

Constraints

1 <= text.length <= 100
1 <= words.length <= 20
1 <= words[i].length <= 50
text and words[i] consist of lowercase English letters.
All the strings of words are unique.

Approach and Intuition

Given the constraints involving relatively small string lengths (with text being at most 100 characters), a brute-force approach is feasible. Here's a step-by-step breakdown of a direct method to solve this problem:

Initiate an empty list to collect the index pairs where matches occur.
Loop through each word in the words array to check for every possible occurrence in text.
For each word:
- Use a loop to go through text from start to finish.
- Within this loop, check if the substring from the current index up to the length of the word matches the word itself.
- If a match is found, record the pair [start_index, end_index] (where end_index is start_index plus the length of the word minus one).
After collecting all the index pairs, sort them. Firstly, sort by the start index. If two pairs have the same start index, sort them by the end index.
Return the sorted list of index pairs.

This method ensures that each word is checked against all possible substrings of text, accounting for overlapping scenarios as demonstrated in the examples. Direct string comparison allows for checking the presence of substrings efficiently, while sorting the results as required by the constraints makes the final output orderly and useful for further processing or analysis.

Solutions

C++
Java
Python

                            cpp
                            
                        
struct TrieNode {
    bool isEndOfWord;
    map<char, TrieNode*> children;
};
    
struct Trie {
    TrieNode* root;
    
    Trie() { root = new TrieNode(); }
    
    void addWord(string word) {
        TrieNode* current = root;
        for (char character : word) {
            if (current->children.count(character) == 0) {
                current->children[character] = new TrieNode();
            }
            current = current->children[character];
        }
        current->isEndOfWord = true;
    }
};
    
class Solution {
public:
    vector<vector<int>> indexPairs(string text, vector<string>& words) {
        Trie trie;
        for (string word : words) {
            trie.addWord(word);
        }
            
        vector<vector<int>> indices;
        for (int start = 0; start < text.length(); start++) {
            TrieNode* node = trie.root;
            for (int end = start; end < text.length(); end++) {
                if (node->children.find(text[end]) == node->children.end()) {
                    break;
                }
                node = node->children[text[end]];
                if (node->isEndOfWord) {
                    indices.push_back({start, end});
                }
            }
        }
        return indices;
    }
};

The provided C++ program is designed to solve the problem of finding all index pairs in a given string, text, that correspond to any word present in a list of words. This solution utilizes a Trie data structure to efficiently store and query words.

Start by defining a TrieNode structure which includes a boolean to indicate if a node marks the end of a word and a dictionary (map) to maintain child nodes.
A Trie structure manages the root node and includes an addWord function to insert words into the Trie.
In the indexPairs function:
1. Initialize the Trie and add each word from the list of words.
2. Prepare a vector to store the index pairs.
3. Iterate over every possible starting index of the substring in text.
4. For each start index, iterate to identify all valid substrings that are words in the Trie.
5. If at any character the Trie does not have a corresponding child, discontinue that segment checking.
6. If a sequence concludes at a node that marks the end of a word, add the starting and ending indices as a pair to the result list.
Finally, return the compiled list of index pairs.

This approach leverages the Trie's efficient look-up capabilities to check each substring of text, making it well-suited for cases where a large dictionary of words needs to be checked against a possibly long string. The use of a Trie optimizes the search process, reducing the complexity compared to a naive search for each word against all substrings of text.

                            java
                            
                        
class TrieNode {
    public boolean isWordEnd;
    public Map<Character, TrieNode> children = new HashMap<>();
}
    
class Trie {
    public TrieNode rootNode;
    
    public Trie() {
        rootNode = new TrieNode();
    }
    
    public void insert(String word) {
        TrieNode current = rootNode;
        for (int i = 0; i < word.length(); i++) {
            char ch = word.charAt(i);
            if (!current.children.containsKey(ch)) {
                current.children.put(ch, new TrieNode());
            }
            current = current.children.get(ch);
        }
        current.isWordEnd = true;
    }
}
    
class Solution {
    public int[][] indexPairs(String text, String[] words) {
        Trie dictionary = new Trie();
        for (String word : words) {
            dictionary.insert(word);
        }
        List<int[]> pairsList = new ArrayList<>();
        for (int i = 0; i < text.length(); i++) {
            TrieNode node = dictionary.rootNode;
            for (int j = i; j < text.length(); j++) {
                if (!node.children.containsKey(text.charAt(j))) {
                    break;
                }
                node = node.children.get(text.charAt(j));
                if (node.isWordEnd) {
                    pairsList.add(new int[] { i, j });
                }
            }
        }
        int[][] pairsArray = pairsList.toArray(new int[pairsList.size()][]);
        return pairsArray;
    }
}

This solution explains how to identify all index pairs for words from a given list that are sub-strings within a provided text using a Trie data structure in Java.

Begin by defining a TrieNode class with:
- A flag isWordEnd to determine if the node marks the end of a word.
- A map children to hold child nodes.
Implement a Trie class with:
- A root node rootNode.
- An insert method that allows adding words to the Trie. This method iterates through each character of the word, checks if it exists in the current node's children, and if not, it adds it.
In the Solution class, define the indexPairs method with parameters text (the main string) and words (array of words to search within the text):
- Create a Trie and insert all words from the words array.
- Initialize a list pairsList to store the starting and ending indices of each found word.
- Iterate through each character of the text and use a nested loop to check every substring starting from the current character.
- For each character in the substring, check if it exists within the current Trie node's children. If not, break out of the loop.
- If a node confirms the end of a word (isWordEnd is true), add the current indices to the pairsList.
- Finally, convert pairsList to an array and return it.

This structured approach ensures efficient searching of all index pairs that match words from the list using Trie, making the process faster and more memory-efficient for large sets of words or texts.

                            python
                            
                        
class PrefixTreeNode:
    def __init__(self):
        self.endOfWord = False
        self.children = {}
    
class WordTrie:
    def __init__(self):
        self.root = PrefixTreeNode()
    
    def addWord(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = PrefixTreeNode()
            node = node.children[char]
        node.endOfWord = True
    
class Solution:
    def indexPairs(self, text: str, words: List[str]) -> List[List[int]]:
        trie = WordTrie()
        for word in words:
            trie.addWord(word)
        pairs = []
        for start in range(len(text)):
            node = trie.root
            for end in range(start, len(text)):
                if text[end] not in node.children:
                    break
                node = node.children[text[end]]
                if node.endOfWord:
                    pairs.append([start, end])
        return pairs

This Python solution involves constructing a trie to efficiently find all start and endpoint index pairs in a given string, text, that match any of the words from a list, words. A trie, or prefix tree, is a tree data structure used to efficiently store a dynamic set of strings where keys are usually strings. The solution leverages this data structure to quickly look up substrings as it iterates over the text.

First, define two classes, PrefixTreeNode and WordTrie:

PrefixTreeNode tracks whether a node marks the end of a word (endOfWord) and also contains a dictionary (children) pointing to child nodes.
WordTrie initializes with a root node of PrefixTreeNode and includes an addWord method that adds a word to the trie, creating necessary children for each character.

In the Solution class:

The indexPairs method initializes a WordTrie instance and adds each word from the list words to the trie.
It then iterates over each character in text, using nested loops to try all potential substring starting points.
For each starting point, the method checks subsequent characters, traversing deeper into the trie.
If a character sequence does not follow any path in the trie (children dictionary does not contain the character), the inner loop terminates early.
If the end of a word is reached as indicated by the endOfWord flag on the trie node, the starting and ending indices of the substring are recorded.
Finally, the method returns a list of all index pairs found.

This implementation effectively finds all index pairs without repeated scanning of the text for each word, utilizing the trie for fast lookups and bounded by the length of the text. Additionally, trie construction ensures that each added word only requires time linear to its length, optimizing the overall computational complexity of the process.

Comments

No comments yet.

Index Pairs of a String

Problem Statement

Examples

Example 1

Example 2

Constraints

Approach and Intuition

Solutions

Comments

Products

Features

Solutions

Marketplace

Resources

Company