Distinct Subsequences

Problem Statement
The task is to determine the number of distinct subsequences of a given string s that match exactly another string t. A subsequence of a string is a new string generated from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. For instance, "ace" is a subsequence of "abcde". We aim to find all such subsequences of s that are identical to t. This problem ensures that the solution remains within the bounds of a 32-bit signed integer.
Examples
Example 1
Input:
s = "rabbbit", t = "rabbit"
Output:
3
Explanation:
As shown below, there are 3 ways you can generate "rabbit" from s.
rabbbit
rabbbit
rabbbit
Example 2
Input:
s = "babgbag", t = "bag"
Output:
5
Explanation:
As shown below, there are 5 ways you can generate "bag" from s.
babgbag
babgbag
babgbag
babgbag
babgbag
Constraints
1 <= s.length, t.length <= 1000sandtconsist of English letters.
Approach and Intuition
To solve this problem, a dynamic programming approach is very relevant:
- Define a 2D array
dpwheredp[i][j]represents the count of distinct subsequences ofs[0...i-1](the substring ofsfrom start to indexi-1inclusive) that equalst[0...j-1]. - Initialize
dp[0][0]to1because an empty substring ofsmatches an empty substring oftin exactly one way. - Initialize
dp[i][0]to1for alli >= 1; any part ofs(including the whole string) contains the empty stringtexactly once as a subsequence. - For all other positions, compute
dp[i][j]based on two conditions:- If characters
s[i-1]andt[j-1]are the same, updatedp[i][j]based on counting these subsequences by includings[i-1]as part of the subsequence (dp[i-1][j-1]) and ignorings[i-1](dp[i-1][j]). - If characters
s[i-1]andt[j-1]are not the same, thendp[i][j]is entirely based on ignorings[i-1](dp[i-1][j]).
- If characters
Let's walk through aspects of the examples to clarify:
In Example 1 with
s = "rabbbit"andt = "rabbit": By evaluating different matching scenarios, such as considering every character's occurrence and possibility insfor use in a matching sequence, we get three distinct ways to form "rabbit".In Example 2 with
s = "babgbag"andt = "bag": The pattern "bag" can be created in numerous ways by skipping different characters inswhile maintaining relative order, leading to five different sequences.
The above approach calculates the answer by efficiently using prior computed values. This strategy emphasizes the power of dynamic programming in reducing redundant computations, especially in counting problems involving subsequences. It ensures that our solution is quantifiable and adaptable to input scales defined by the constraints.
Solutions
- C++
- Java
- C
- JavaScript
- Python
class Solution {
public:
int countSubsequences(string str1, string str2) {
int len1 = str1.length();
int len2 = str2.length();
vector<vector<unsigned int>> dpTable(len1 + 1, vector<unsigned int>(len2 + 1));
for (int i = 0; i <= len1; i++) {
dpTable[i][len2] = 1;
}
for (int j = len2 - 1; j >= 0; j--) {
for (int i = len1 - 1; i >= 0; i--) {
if (str1[i] == str2[j]) {
dpTable[i][j] = dpTable[i + 1][j + 1] + dpTable[i + 1][j];
} else {
dpTable[i][j] = dpTable[i + 1][j];
}
}
}
return dpTable[0][0];
}
};
This solution tackles the problem of finding the number of distinct subsequences of one string (str2) within another (str1). It's implemented in C++ and makes use of dynamic programming to solve the problem efficiently.
The approach uses a 2D vector, dpTable, where the element dpTable[i][j] represents the count of subsequences starting from the i-th character of str1 and the j-th character of str2. The process is as follows:
- Initialize
dpTablewith dimensions(len1 + 1) x (len2 + 1), wherelen1andlen2are the lengths ofstr1andstr2respectively. - Set the last column of
dpTable, reflecting scenarios where the remainder ofstr2is an empty subsequence. - Iteratively fill
dpTable, moving backwards fromlen1andlen2. For each pair(i, j), check if characters ofstr1andstr2at positionsiandjmatch:- If they match, the value at
dpTable[i][j]is set to the sum ofdpTable[i + 1][j + 1]anddpTable[i + 1][j]. This accounts for scenarios where the character atstr1[i]contributes to a subsequence and scenarios where it does not. - If they do not match, propagate the value from
dpTable[i + 1][j].
- If they match, the value at
Finally, dpTable[0][0] will contain the total number of distinct subsequences of str2 in str1. This approach ensures that all potential subsequences are calculated efficiently by leveraging previously computed results. Thus, the solution optimizes both time and space complexity using dynamic programming principles.
class Solution {
public int distinctSubsequences(String s, String t) {
int sLen = s.length();
int tLen = t.length();
int[] dpTable = new int[tLen];
int previous = 1;
for (int i = sLen - 1; i >= 0; i--) {
previous = 1;
for (int j = tLen - 1; j >= 0; j--) {
int temp = dpTable[j];
if (s.charAt(i) == t.charAt(j)) {
dpTable[j] += previous;
}
previous = temp;
}
}
return dpTable[0];
}
}
In the Java solution for counting "Distinct Subsequences," the program determines how many times the string t appears as a subsequence in string s. This uses dynamic programming to efficiently solve the problem, avoiding unnecessary recomputation.
- Start by obtaining the lengths of strings
s(sLen) andt(tLen). - Create a dynamic programming table
dpTableof integers, initializing it to zeros. This table will have a length equal totLen, and each indexjrepresents the number of ways the substringt[0...j]can be formed froms[i...sLen-1]. - Utilize a helper integer
previousto hold the cumulative count of subsequences for processing the dynamic programming transitions.
The core processing involves iterating backwards through both strings s and t:
- Initialize
previousto 1 at the start of each outer loop iteration overs. - For each character in
s, iterate backwards overt. During this:- Store the current value at
dpTable[j]into a temporary variabletemp. - If
s[i]matchest[j], incrementdpTable[j]by the valuepreviousbecause each match offers new opportunities to form subsequences up to that index. - Update
previousto the value stored intemp, which represents the subsequences count up to the next character int.
- Store the current value at
At the end of the iterations, dpTable[0] will contain the total number of distinct subsequences of t in s, which is the desired output of the function.
long long MODULO = 1000000007;
int countDistinctSubsequences(char* string1, char* string2) {
int len1 = strlen(string1);
int len2 = strlen(string2);
long long ways[len1 + 1][len2 + 1];
memset(ways, 0, sizeof(ways));
for (int i = 0; i <= len1; i++) ways[i][len2] = 1;
for (int i = len1 - 1; i >= 0; i--) {
for (int j = len2 - 1; j >= 0; j--) {
ways[i][j] = ways[i + 1][j];
if (string1[i] == string2[j]) {
ways[i][j] += ways[i + 1][j + 1];
ways[i][j] %= MODULO;
}
}
}
return (int)ways[0][0];
}
The solution provided uses dynamic programming to count the number of distinct subsequences of string2 that can be formed from string1. Here's an overview of the method used in the C program:
- Initialize variables
len1andlen2to store the lengths ofstring1andstring2, respectively. - Define a two-dimensional array
waysof size(len1 + 1) x (len2 + 1)initialized with zeros to store the number of ways subproblem solutions. The scalarMODULOis set to1000000007to ensure results are computed under this modulus. - Fill the base case: for any
i,ways[i][len2]is set to1. This represents that there is exactly one way to match an empty subsequence. - Iterate through the characters of
string1from end to start. For each character, iterate throughstring2from end to start:- Set
ways[i][j]equal toways[i + 1][j], representing that all subsequences starting from the next character ofstring1are also valid starting from the current character if the current character ofstring1is skipped. - If the characters at
string1[i]andstring2[j]are the same, updateways[i][j]by adding the value ofways[i + 1][j + 1]and then take modulusMODULO. This addition represents that every subsequence that can be formed by using the next characters of both strings can also be formed by starting from the current characters.
- Set
- Return the value of
ways[0][0]which contains the count of distinct subsequences ofstring2withinstring1.
This method effectively uses dynamic programming principles to reduce the problem into smaller, manageable subproblems, thus ensuring an efficient solution to counting distinct subsequences. The use of the modulo operation ensures that the solution remains efficient in terms of memory and processing, avoiding overflow issues.
var countSubsequences = function (source, target) {
let srcLength = source.length;
let tgtLength = target.length;
let dp = new Array(tgtLength).fill(0);
let lastVal = 1;
for (let i = srcLength - 1; i >= 0; i--) {
lastVal = 1;
for (let j = tgtLength - 1; j >= 0; j--) {
let temp = dp[j];
if (source.charAt(i) === target.charAt(j)) {
dp[j] += lastVal;
}
lastVal = temp;
}
}
return dp[0];
};
The solution provided tackles the problem of counting the number of distinct subsequences of a given target string that can be derived from a given source string. The approach utilizes dynamic programming to efficiently compute the required count. Below explains the solution implemented in the given JavaScript function countSubsequences.
- Initialize variables for the length of the source (
srcLength) and target (tgtLength) strings. - Create an array
dpof sizetgtLengthand initialize its elements to0. This array is key in storing intermediate results, specifically, the count of subsequences ending at different positions in the target. - Start iterating over the source string from the end to the beginning. This reverse iteration helps in building up the solution based on subsequences formed with earlier characters.
- Use a nested loop to also iterate over the target string from the end to the start. This double iteration allows comparing each character from the source with each character in the target.
- For each pair of characters from source and target, use the previous value (
lastVal) which stores the sum of subsequences found up to the previous iteration. If the characters match, update the current position in thedparray by addinglastVal, which effectively counts a new valid subsequence. - At the conclusion of both loops,
dp[0]contains the total count of distinct subsequences that match the target string.
The function returns the value dp[0] as the total number of distinct subsequences of the target in the source, providing a dynamic and memory-efficient solution to the problem.
class Solution:
def countDistinctSubsequences(self, source: str, target: str) -> int:
source_len, target_len = len(source), len(target)
subsequences_count = [0] * target_len
for i in range(source_len - 1, -1, -1):
last = 1
for j in range(target_len - 1, -1, -1):
previous_value = subsequences_count[j]
if source[i] == target[j]:
subsequences_count[j] += last
last = previous_value
return subsequences_count[0]
The given Python code defines a method countDistinctSubsequences to determine the number of distinct subsequences in the source string that match a target string. Here's a concise summary of how the solution works:
- Initialize the lengths of both the
sourceandtargetstrings. - Create an array
subsequences_countinitialized to zero with the same length astarget. This array is used to keep track of the count of subsequences found that match the target up to each character. - Traverse the
sourcestring in reverse.- For each character in
source, start another reverse traversal for thetargetstring. - Use a temporary variable
lastto store the subsequences count temporarily. - During the nested loop over the
target:- If a character in
sourcematches a character intarget, update thesubsequences_countfor the current position by adding the value oflast, which represents the total ways the current subsequence could form the target up to that character.
- If a character in
- Update the
lastvalue to the current value ofsubsequences_countat the positionj.
- For each character in
- After processing all characters, the first element of
subsequences_countholds the result, which represents the total distinct subsequences insourcethat form the entiretarget.
This method utilizes dynamic programming principles by storing intermediate results to avoid redundant calculations and efficiently computing the desired count of subsequences.