Distinct Subsequences

Updated on 23 May, 2025
Distinct Subsequences header image

Problem Statement

The task is to determine the number of distinct subsequences of a given string s that match exactly another string t. A subsequence of a string is a new string generated from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. For instance, "ace" is a subsequence of "abcde". We aim to find all such subsequences of s that are identical to t. This problem ensures that the solution remains within the bounds of a 32-bit signed integer.

Examples

Example 1

Input:

s = "rabbbit", t = "rabbit"

Output:

3

Explanation:

As shown below, there are 3 ways you can generate "rabbit" from s.

rabbbit rabbbit rabbbit

Example 2

Input:

s = "babgbag", t = "bag"

Output:

5

Explanation:

As shown below, there are 5 ways you can generate "bag" from s.

babgbag babgbag babgbag babgbag babgbag

Constraints

  • 1 <= s.length, t.length <= 1000
  • s and t consist of English letters.

Approach and Intuition

To solve this problem, a dynamic programming approach is very relevant:

  1. Define a 2D array dp where dp[i][j] represents the count of distinct subsequences of s[0...i-1] (the substring of s from start to index i-1 inclusive) that equals t[0...j-1].
  2. Initialize dp[0][0] to 1 because an empty substring of s matches an empty substring of t in exactly one way.
  3. Initialize dp[i][0] to 1 for all i >= 1; any part of s (including the whole string) contains the empty string t exactly once as a subsequence.
  4. For all other positions, compute dp[i][j] based on two conditions:
    • If characters s[i-1] and t[j-1] are the same, update dp[i][j] based on counting these subsequences by including s[i-1] as part of the subsequence (dp[i-1][j-1]) and ignoring s[i-1] (dp[i-1][j]).
    • If characters s[i-1] and t[j-1] are not the same, then dp[i][j] is entirely based on ignoring s[i-1] (dp[i-1][j]).

Let's walk through aspects of the examples to clarify:

  • In Example 1 with s = "rabbbit" and t = "rabbit": By evaluating different matching scenarios, such as considering every character's occurrence and possibility in s for use in a matching sequence, we get three distinct ways to form "rabbit".

  • In Example 2 with s = "babgbag" and t = "bag": The pattern "bag" can be created in numerous ways by skipping different characters in s while maintaining relative order, leading to five different sequences.

The above approach calculates the answer by efficiently using prior computed values. This strategy emphasizes the power of dynamic programming in reducing redundant computations, especially in counting problems involving subsequences. It ensures that our solution is quantifiable and adaptable to input scales defined by the constraints.

Solutions

  • C++
  • Java
  • C
  • JavaScript
  • Python
cpp
class Solution {
public:
    int countSubsequences(string str1, string str2) {
        int len1 = str1.length();
        int len2 = str2.length();
        vector<vector<unsigned int>> dpTable(len1 + 1, vector<unsigned int>(len2 + 1));
        for (int i = 0; i <= len1; i++) {
            dpTable[i][len2] = 1;
        }
        for (int j = len2 - 1; j >= 0; j--) {
            for (int i = len1 - 1; i >= 0; i--) {
                if (str1[i] == str2[j]) {
                    dpTable[i][j] = dpTable[i + 1][j + 1] + dpTable[i + 1][j];
                } else {
                    dpTable[i][j] = dpTable[i + 1][j];
                }
            }
        }
        return dpTable[0][0];
    }
};

This solution tackles the problem of finding the number of distinct subsequences of one string (str2) within another (str1). It's implemented in C++ and makes use of dynamic programming to solve the problem efficiently.

The approach uses a 2D vector, dpTable, where the element dpTable[i][j] represents the count of subsequences starting from the i-th character of str1 and the j-th character of str2. The process is as follows:

  • Initialize dpTable with dimensions (len1 + 1) x (len2 + 1), where len1 and len2 are the lengths of str1 and str2 respectively.
  • Set the last column of dpTable, reflecting scenarios where the remainder of str2 is an empty subsequence.
  • Iteratively fill dpTable, moving backwards from len1 and len2. For each pair (i, j), check if characters of str1 and str2 at positions i and j match:
    • If they match, the value at dpTable[i][j] is set to the sum of dpTable[i + 1][j + 1] and dpTable[i + 1][j]. This accounts for scenarios where the character at str1[i] contributes to a subsequence and scenarios where it does not.
    • If they do not match, propagate the value from dpTable[i + 1][j].

Finally, dpTable[0][0] will contain the total number of distinct subsequences of str2 in str1. This approach ensures that all potential subsequences are calculated efficiently by leveraging previously computed results. Thus, the solution optimizes both time and space complexity using dynamic programming principles.

java
class Solution {
    public int distinctSubsequences(String s, String t) {
        int sLen = s.length();
        int tLen = t.length();

        int[] dpTable = new int[tLen];

        int previous = 1;

        for (int i = sLen - 1; i >= 0; i--) {
            previous = 1;

            for (int j = tLen - 1; j >= 0; j--) {
                int temp = dpTable[j];

                if (s.charAt(i) == t.charAt(j)) {
                    dpTable[j] += previous;
                }

                previous = temp;
            }
        }

        return dpTable[0];
    }
}

In the Java solution for counting "Distinct Subsequences," the program determines how many times the string t appears as a subsequence in string s. This uses dynamic programming to efficiently solve the problem, avoiding unnecessary recomputation.

  • Start by obtaining the lengths of strings s (sLen) and t (tLen).
  • Create a dynamic programming table dpTable of integers, initializing it to zeros. This table will have a length equal to tLen, and each index j represents the number of ways the substring t[0...j] can be formed from s[i...sLen-1].
  • Utilize a helper integer previous to hold the cumulative count of subsequences for processing the dynamic programming transitions.

The core processing involves iterating backwards through both strings s and t:

  1. Initialize previous to 1 at the start of each outer loop iteration over s.
  2. For each character in s, iterate backwards over t. During this:
    • Store the current value at dpTable[j] into a temporary variable temp.
    • If s[i] matches t[j], increment dpTable[j] by the value previous because each match offers new opportunities to form subsequences up to that index.
    • Update previous to the value stored in temp, which represents the subsequences count up to the next character in t.

At the end of the iterations, dpTable[0] will contain the total number of distinct subsequences of t in s, which is the desired output of the function.

c
long long MODULO = 1000000007;
int countDistinctSubsequences(char* string1, char* string2) {
    int len1 = strlen(string1);
    int len2 = strlen(string2);
    long long ways[len1 + 1][len2 + 1];
    memset(ways, 0, sizeof(ways));
    for (int i = 0; i <= len1; i++) ways[i][len2] = 1;
    for (int i = len1 - 1; i >= 0; i--) {
        for (int j = len2 - 1; j >= 0; j--) {
            ways[i][j] = ways[i + 1][j];
            if (string1[i] == string2[j]) {
                ways[i][j] += ways[i + 1][j + 1];
                ways[i][j] %= MODULO;
            }
        }
    }
    return (int)ways[0][0];
}

The solution provided uses dynamic programming to count the number of distinct subsequences of string2 that can be formed from string1. Here's an overview of the method used in the C program:

  • Initialize variables len1 and len2 to store the lengths of string1 and string2, respectively.
  • Define a two-dimensional array ways of size (len1 + 1) x (len2 + 1) initialized with zeros to store the number of ways subproblem solutions. The scalar MODULO is set to 1000000007 to ensure results are computed under this modulus.
  • Fill the base case: for any i, ways[i][len2] is set to 1. This represents that there is exactly one way to match an empty subsequence.
  • Iterate through the characters of string1 from end to start. For each character, iterate through string2 from end to start:
    • Set ways[i][j] equal to ways[i + 1][j], representing that all subsequences starting from the next character of string1 are also valid starting from the current character if the current character of string1 is skipped.
    • If the characters at string1[i] and string2[j] are the same, update ways[i][j] by adding the value of ways[i + 1][j + 1] and then take modulus MODULO. This addition represents that every subsequence that can be formed by using the next characters of both strings can also be formed by starting from the current characters.
  • Return the value of ways[0][0] which contains the count of distinct subsequences of string2 within string1.

This method effectively uses dynamic programming principles to reduce the problem into smaller, manageable subproblems, thus ensuring an efficient solution to counting distinct subsequences. The use of the modulo operation ensures that the solution remains efficient in terms of memory and processing, avoiding overflow issues.

js
var countSubsequences = function (source, target) {
    let srcLength = source.length;
    let tgtLength = target.length;
    let dp = new Array(tgtLength).fill(0);
    let lastVal = 1;
    for (let i = srcLength - 1; i >= 0; i--) {
        lastVal = 1;
        for (let j = tgtLength - 1; j >= 0; j--) {
            let temp = dp[j];
            if (source.charAt(i) === target.charAt(j)) {
                dp[j] += lastVal;
            }
            lastVal = temp;
        }
    }
    return dp[0];
};

The solution provided tackles the problem of counting the number of distinct subsequences of a given target string that can be derived from a given source string. The approach utilizes dynamic programming to efficiently compute the required count. Below explains the solution implemented in the given JavaScript function countSubsequences.

  • Initialize variables for the length of the source (srcLength) and target (tgtLength) strings.
  • Create an array dp of size tgtLength and initialize its elements to 0. This array is key in storing intermediate results, specifically, the count of subsequences ending at different positions in the target.
  • Start iterating over the source string from the end to the beginning. This reverse iteration helps in building up the solution based on subsequences formed with earlier characters.
  • Use a nested loop to also iterate over the target string from the end to the start. This double iteration allows comparing each character from the source with each character in the target.
  • For each pair of characters from source and target, use the previous value (lastVal) which stores the sum of subsequences found up to the previous iteration. If the characters match, update the current position in the dp array by adding lastVal, which effectively counts a new valid subsequence.
  • At the conclusion of both loops, dp[0] contains the total count of distinct subsequences that match the target string.

The function returns the value dp[0] as the total number of distinct subsequences of the target in the source, providing a dynamic and memory-efficient solution to the problem.

python
class Solution:
    def countDistinctSubsequences(self, source: str, target: str) -> int:
        source_len, target_len = len(source), len(target)
        subsequences_count = [0] * target_len

        for i in range(source_len - 1, -1, -1):
            last = 1

            for j in range(target_len - 1, -1, -1):
                previous_value = subsequences_count[j]

                if source[i] == target[j]:
                    subsequences_count[j] += last

                last = previous_value

        return subsequences_count[0]

The given Python code defines a method countDistinctSubsequences to determine the number of distinct subsequences in the source string that match a target string. Here's a concise summary of how the solution works:

  • Initialize the lengths of both the source and target strings.
  • Create an array subsequences_count initialized to zero with the same length as target. This array is used to keep track of the count of subsequences found that match the target up to each character.
  • Traverse the source string in reverse.
    • For each character in source, start another reverse traversal for the target string.
    • Use a temporary variable last to store the subsequences count temporarily.
    • During the nested loop over the target:
      • If a character in source matches a character in target, update the subsequences_count for the current position by adding the value of last, which represents the total ways the current subsequence could form the target up to that character.
    • Update the last value to the current value of subsequences_count at the position j.
  • After processing all characters, the first element of subsequences_count holds the result, which represents the total distinct subsequences in source that form the entire target.

This method utilizes dynamic programming principles by storing intermediate results to avoid redundant calculations and efficiently computing the desired count of subsequences.

Comments

No comments yet.