String Matching in an Array

Updated on 25 June, 2025
String Matching in an Array header image

Problem Statement

You are given an array of strings words. Your task is to identify all strings from this array that appear as a substring in any other string within the same array.

Return the list of such strings. The output can be in any order.

Examples

Example 1

Input:

words = ["mass", "as", "hero", "superhero"]

Output:

["as", "hero"]

Explanation:

"as" is a substring of "mass", and "hero" is a substring of "superhero".
["hero", "as"] is also a valid answer.

Example 2

Input:

words = ["vultrcloud", "ul", "cloud"]

Output:

["ul", "cloud"]

Explanation:

Both "ul" and "cloud" are substrings of "vultrcloud".

Example 3

Input:

words = ["blue", "green", "bu"]

Output:

[]

Explanation:

None of the strings are substrings of any other string.

Constraints

  • 1 <= words.length <= 100
  • 1 <= words[i].length <= 30
  • words[i] consists only of lowercase English letters.
  • All strings in words are unique.

Approach and Intuition

  1. Iterate Over All Pairs: Since the number of strings is small (≤ 100), a brute-force approach is acceptable. For every word s in the list, compare it with every other word t to see if s is a proper substring of t.

  2. Skip Self Comparison: Because all strings are unique, a word is never a substring of itself — so you can skip checking s in s.

  3. Substring Check: Use the in operator (e.g., s in t in Python) to check whether s is a substring of t.

  4. Avoid Duplicates: Use a set to collect the results or ensure that each string is added to the result list only once.

  5. Return Result: Once all comparisons are complete, return the list of substrings.

Time Complexity

  • O(n^2 × m) — where n is the number of strings and m is the maximum string length. Since both are small (n ≤ 100, m ≤ 30), this is efficient for the given constraints.

Solutions

  • C++
  • Java
  • Python
cpp
class SubstringMatcher {
public:
    vector<string> findSubstrings(vector<string>& wordsList) {
        vector<string> foundWords;
        Trie* root = new Trie();  // Initialize the Trie root node.

        // Append all word suffixes into the Trie structure.
        for (const auto& word : wordsList) {
            for (int startIdx = 0; startIdx < word.size(); startIdx++) {
                // Insert each starting suffix from `startIdx` into the Trie.
                addWordSuffix(root, word.substr(startIdx));
            }
        }

        // Determine if any word is a substring within the trie modeled data.
        for (auto word : wordsList) {
            if (checkSubstring(root, word)) {
                foundWords.push_back(word);
            }
        }

        return foundWords;
    }

private:
    class Trie {
    public:
        // Count occurrences of substrings from the Trie.
        int count;
        // Child nodes mapped by character keys.
        unordered_map<char, Trie*> children;
    };

    // Insertion of word or a suffix into the Trie structure.
    void addWordSuffix(Trie* root, const string& wordSuffix) {
        Trie* node = root;
        for (char ch : wordSuffix) {
            // Move througn the Trie nodes, creating new nodes if necessary.
            if (node->children.find(ch) != node->children.end()) {
                node = node->children[ch];
                // Increment frequency count for this node.
                node->count++;
            } else {
                // Create a new node if character node doesn't exist.
                Trie* newNode = new Trie();
                newNode->count = 1;
                node->children[ch] = newNode;
                node = newNode;
            }
        }
    }

    // Method to verify if a word is present as a substring in the Trie.
    bool checkSubstring(Trie* root, string& word) {
        Trie* node = root;  // Begin the traverse at root.
        for (char ch : word) {
            // Move through each character's node.
            node = node->children[ch];
        }
        // The word is a substring or overlapping if seen more than once.
        return node->count > 1;
    }
};

The provided C++ program is designed to identify strings within an array that appear as substrings within other strings of the same array. This solution employs a data structure called Trie, also known as a prefix tree, which effectively handles string manipulations and searches.

  • Convert the list of words into a Trie structure where each node represents a possible substring of the input words.
  • Insert every suffix of each word into the Trie. This ensures that all possible substrings are accounted for.
  • For each word in the list, check if it exists as a substring in the Trie populated with suffixes from all the words.
  • If a word is found as a substring, add it to the result list.

Key operations are:

  • Adding word suffix to the Trie: Iterate over each word, creating suffix starting from each index, and inserting it into the Trie. Each node in the Trie keeps track of the frequency of the appearances of the substring.
  • Checking for substrings in the Trie: For each word, traverse through the Trie based on its characters and determine if the substring appears more than once, indicating overlap or substring presence in the array.

This approach is efficient by leveraging the Trie data structure for fast insertion and search operations, significantly enhancing the process over brute forcing through all possible pairs of substrings. The use of a Trie also handles overlaps and repeated substrings elegantly, making the solution robust for varied inputs.

java
class Finder {

    class TrieNode {
        int count; // Number of times this node is visited
        Map<Character, TrieNode> children;

        TrieNode() {
            count = 0;
            children = new HashMap<>();
        }
    }

    public List<String> findSubstrings(String[] words) {
        List<String> result = new ArrayList<>();
        TrieNode root = new TrieNode(); // Root of the Trie

        // Building the Trie with all possible suffixes
        for (String word : words) {
            for (int start = 0; start < word.length(); start++) {
                addSuffix(root, word.substring(start));
            }
        }

        // Finding all substrings in Trie that appear in the list more than once
        for (String word : words) {
            if (checkSubstring(root, word)) {
                result.add(word);
            }
        }

        return result;
    }

    private void addSuffix(TrieNode root, String suffix) {
        TrieNode current = root;
        for (char ch : suffix.toCharArray()) {
            current.children.putIfAbsent(ch, new TrieNode());
            current = current.children.get(ch);
            current.count++;
        }
    }

    private boolean checkSubstring(TrieNode root, String str) {
        TrieNode current = root;
        for (char ch : str.toCharArray()) {
            if (current.children.get(ch) == null) return false;
            current = current.children.get(ch);
        }
        return current.count > 1;
    }
}

This Java solution for the "String Matching in an Array" problem involves constructing a Trie (prefix tree) to efficiently store and query substrings of provided words. The process can be highlighted as follows:

  • Initialize a TrieNode class to represent each node in the Trie, where each node contains a count indicating the number of times the node is visited, and a map children to store child nodes.

  • Construct a method findSubstrings that:

    1. Initializes an empty list result to hold the final substrings.
    2. Constructs the root of the Trie.
    3. Iteratively adds all possible suffixes of each string from the input array to the Trie using addSuffix method.
    4. Checks if each word in the input array is a substring in the other words by utilizing the checkSubstring method, adding it to the result list if true.
  • The addSuffix method takes a root and a suffix string, navigating through or building the Trie by:

    1. Iterating over each character in the suffix.
    2. For each character, either fetch or create a new child node in the Trie, and increment the count associated with the node.
  • In the checkSubstring method:

    1. Navigates through the Trie based on each character of the input string.
    2. Verifies the presence of the string in Trie and ensures that the count of the last node of this string is more than one, indicating substring inclusion in other strings.

By utilizing the Trie structure, this approach efficiently handles substrings, leveraging the count property of each node to determine the repetition of substrings within the input string list. This results in a comprehensive yet efficient solution to the substring matching challenge.

python
class Solution:

    class Node:
        def __init__(self):
            self.count = 0
            self.children = {}

    def findSubstrings(self, words: List[str]) -> List[str]:
        substrs_found = []
        trie_root = self.Node()  # Root node of the Trie

        # Build trie by adding all word suffixes
        for word in words:
            for i in range(len(word)):
                self._add_suffix_to_trie(trie_root, word[i:])

        # Determine if each word is a substring more than once
        for word in words:
            if self._check_for_deps(trie_root, word):
                substrs_found.append(word)

        return substrs_found

    def _add_suffix_to_trie(self, root: "Node", suffix: str) -> None:
        node = root
        for ch in suffix:
            if ch not in node.children:
                node.children[ch] = self.Node()
            node = node.children[ch]
            node.count += 1

    def _check_for_deps(self, root: "Node", item: str) -> bool:
        node = root
        for ch in item:
            node = node.children.get(ch)
        return node.count > 1

The Python solution provided is designed to identify substrings from an array of words that appear more than once using a Trie data structure. Here's a concise summary of the method implemented:

  • Initialize an empty list called substrs_found to store the result.
  • Create a root node for the Trie using the nested Node class.
  • Populate the Trie with all possible suffixes of each word in the input array words.
  • For each word, check if it exists as a substring in other words based on its count in the Trie.
  • The _add_suffix_to_trie function is responsible for adding each suffix of a word to the Trie, incrementing the count of occurrences each time.
  • The _check_for_deps function is used to check if a word appears more than once as a substring by traversing the Trie and checking the count of the node associated with the last character of the word.
  • The list substrs_found gets appended with words that qualify as repeated substrings and is returned as the output.

This solution effectively utilizes the Trie to provide an efficient method for substring detection in an array of words.

Comments

No comments yet.