Number of Matching Subsequences

Updated on 08 July, 2025
Number of Matching Subsequences header image

Problem Statement

The challenge is to find the count of strings within an array words that can be formed as a subsequence from a given string s. A subsequence in this context means you can derive a string by removing some or no characters from the string s without changing the order of the remaining characters. Importantly, each string within words must maintain its internal character order to be considered a valid subsequence of s.

Examples

Example 1

Input:

s = "abcde", words = ["a","bb","acd","ace"]

Output:

3

Explanation:

There are three strings in words that are a subsequence of s: "a", "acd", "ace".

Example 2

Input:

s = "dsahjpjauf", words = ["ahjpjau","ja","ahbwzgqnuk","tnmlanowax"]

Output:

2

Constraints

  • 1 <= s.length <= 5 * 104
  • 1 <= words.length <= 5000
  • 1 <= words[i].length <= 50
  • s and words[i] consist of only lowercase English letters.

Approach and Intuition

The task involves determining if each word in the words array is a subsequence of the string s. Let’s break down how we might approach finding a solution:

  1. Understand Subsequence: For a string to be a subsequence of another, each character of the former should appear in the latter and in the same order, though not necessarily consecutively.

  2. Iterative Comparison:

    • Iterate through each word in the words array. For each word, try to match it against s.
    • Use pointers or indices to trace the position in both the word and s. Move through s until all characters of the current word are found in order.
  3. Counting:

    • For each word, if you can trace all characters in s while maintaining order, count it as a valid subsequence.
    • Keep a total count of such valid subsequences.
  4. Efficiency Consideration:

    • Given the potential size of s and number of words, an efficient approach would be crucial.
    • Avoid re-checking any part of the s unnecessarily by using the dual-pointer technique that advances on s for comparisons and only resets for the next word in words.
  5. Edge Cases:

    • Consider matching with a single character.
    • Handle cases where elements in words could be an empty string theoretically (not explicitly counter-indicated given constraints but good to consider).

By following this method, you systematically verify each word against s, ensuring computational efficiency and adherence to the subsequence definition.

Solutions

  • Java
java
class Solution {
    public int matchSubsequences(String mainString, String[] wordList) {
        int count = 0;
        ArrayList<Node>[] bucketArray = new ArrayList[26];
        for (int i = 0; i < 26; ++i)
            bucketArray[i] = new ArrayList<Node>();
    
        for (String word : wordList)
            bucketArray[word.charAt(0) - 'a'].add(new Node(word, 0));
    
        for (char ch : mainString.toCharArray()) {
            ArrayList<Node> currentBucket = bucketArray[ch - 'a'];
            bucketArray[ch - 'a'] = new ArrayList<Node>();
    
            for (Node node : currentBucket) {
                node.index++;
                if (node.index == node.word.length()) {
                    count++;
                } else {
                    bucketArray[node.word.charAt(node.index) - 'a'].add(node);
                }
            }
            currentBucket.clear();
        }
        return count;
    }
}
    
class Node {
    String word;
    int index;
    
    public Node(String word, int idx) {
        this.word = word;
        index = idx;
    }
}

This Java solution is designed to determine how many words from a given list are subsequences of a main string. The implemented method, matchSubsequences, efficiently processes through each character of the main string and utilizes a bucket array concept to manage potential subsequences dynamically.

  • Implement a bucket array of lists that corresponds to each letter of the alphabet.

  • For every word in the provided wordList, place the word into the bucket that corresponds to its starting character.

  • Iterate over each character in mainString. For the current character:

    • Move all nodes in the corresponding bucket to a new processing list.
    • Clear the original bucket for reuse.
    • For each node in the processing list:
      • Increment the index to point to the next character in the node's word.
      • Check if the node's index matches the word length, which indicates a complete subsequence match and increment the count.
      • If the word has more characters to match, place the node back into the bucket that corresponds to the current character at the node's new index.
  • Return the count of total matches found, which represents how many words in the wordList are subsequences of the mainString.

This approach leverages the efficiency of bucket sorting and direct indexing to minimize unnecessary comparisons and optimizes checking for subsequences within the main string.

Comments

No comments yet.