Shortest Common Supersequence

Updated on 09 July, 2025
Shortest Common Supersequence  header image

Problem Statement

Given two input strings, str1 and str2, the task is to find and return the shortest possible string that contains both str1 and str2 as subsequences. A subsequence of a string t is derived by deleting some characters (which could also be zero deletions) from t without changing the order of the remaining characters. The problem requires not only finding such a string but ensuring that it is the shortest possible string which can encompass both the input strings as its subsequences. If multiple such shortest strings exist, returning any one of them is acceptable.

Examples

Example 1

Input:

str1 = "abac", str2 = "cab"

Output:

"cabac"

Explanation:

str1 = "abac" is a subsequence of "cabac" because we can delete the first "c".
str2 = "cab" is a subsequence of "cabac" because we can delete the last "ac".
The answer provided is the shortest such string that satisfies these properties.

Example 2

Input:

str1 = "aaaaaaaa", str2 = "aaaaaaaa"

Output:

"aaaaaaaa"

Constraints

  • 1 <= str1.length, str2.length <= 1000
  • str1 and str2 consist of lowercase English letters.

Approach and Intuition

To solve the problem of finding the shortest string containing two given strings as subsequences, we can utilize the concept similar to the "Shortest Common Supersequence". Below are some steps and considerations based on the given examples and constraints:

  1. Understanding Subsequences and Superssequences:

    • A subsequence is formed by deleting certain characters of a string, without rearranging the order of the remaining characters.
    • A supersequence of two strings is a single string that contains each of the given strings as a subsequence.
  2. Utilizing Dynamic Programming:

    • We can use a dynamic programming approach akin to solving the "Longest Common Subsequence" (LCS) problem.
    • Generally, the length of the shortest common supersequence can be determined by len(str1) + len(str2) - LCS(str1, str2).
    • By finding the LCS, we identify the overlap or the common sequence between the two strings that need not be repeated in our supersequence.
  3. Constructing the Shortest Supersequence:

    • After calculating the LCS, construct the shortest supersequence by iterating through both strings.
    • Add non-LCS characters while ensuring that the characters from the LCS are added in the correct sequence.
  4. Consider Edge Cases:

    • If either string is a subsequence of the other, then the longer string is the shortest supersequence.
    • Ensure that each character is included from both strings in such a way that they remain subsequences in the resultant string.

Applying these principles will allow us to construct a solution that efficiently finds the shortest string in which both input strings are subsequences. The dynamic programming approach ensures that we handle strings of lengths up to the constraint limits efficiently.

Solutions

  • C++
cpp
class Solution {
public:
    string findShortestSupersequence(string firstStr, string secondStr) {
        int lenFirst = firstStr.length();
        int lenSecond = secondStr.length();
    
        vector<vector<int>> dpMatrix(lenFirst + 1, vector<int>(lenSecond + 1, 0));
    
        // Edge cases initialization
        for (int i = 0; i <= lenFirst; i++) {
            dpMatrix[i][0] = i;
        }
        for (int j = 0; j <= lenSecond; j++) {
            dpMatrix[0][j] = j;
        }
    
        // Populating the DP matrix
        for (int i = 1; i <= lenFirst; i++) {
            for (int j = 1; j <= lenSecond; j++) {
                if (firstStr[i - 1] == secondStr[j - 1]) {
                    dpMatrix[i][j] = dpMatrix[i - 1][j - 1] + 1;
                } else {
                    dpMatrix[i][j] = min(dpMatrix[i - 1][j], dpMatrix[i][j - 1]) + 1;
                }
            }
        }
    
        string resultSequence = "";
        int i = lenFirst, j = lenSecond;
    
        // Tracing back to find the common supersequence
        while (i > 0 && j > 0) {
            if (firstStr[i - 1] == secondStr[j - 1]) {
                resultSequence += firstStr[i - 1];
                i--;
                j--;
            } else if (dpMatrix[i - 1][j] < dpMatrix[i][j - 1]) {
                resultSequence += firstStr[i - 1];
                i--;
            } else {
                resultSequence += secondStr[j - 1];
                j--;
            }
        }
    
        // Append any remaining characters from either string
        while (i > 0) {
            resultSequence += firstStr[i - 1];
            i--;
        }
        while (j > 0) {
            resultSequence += secondStr[j - 1];
            j--;
        }
    
        reverse(resultSequence.begin(), resultSequence.end());
        return resultSequence;
    }
};

The goal is to determine the Shortest Common Supersequence (SCS) of two strings using C++. Here's how the provided code accomplishes this task effectively utilizing dynamic programming (DP):

  • Initialize Variables and DP Matrix:

    • Capture the lengths of the two input strings, firstStr and secondStr.
    • Create a DP matrix dpMatrix sized (lenFirst+1 x lenSecond+1) initialized to zero, to store the length of the SCS at each substring intersection.
  • Base Case Setup:

    • Set up the base cases in the DP matrix. This involves setting the edges where one of the strings is of zero length. If firstStr or secondStr is empty, the length of the SCS is simply the length of the other string at that point.
  • Fill the DP Matrix:

    • Iterate over each character of both strings. For each pair of indices (i, j):
      • If characters from both strings match, set dpMatrix[i][j] as dpMatrix[i-1][j-1] + 1.
      • If they don't match, choose the minimum length from either removing a character from firstStr or secondStr and add one to it.
  • Construct the SCS:

    • Start from the bottom-right corner of the matrix (lenFirst, lenSecond) and trace back to the top-left corner.
    • Append characters to the resultSequence based on the matrix values, prioritizing matches and then minimum edits.
  • Handle Remaining Characters:

    • After reaching one of the top rows or columns, append the remaining characters from either firstStr or secondStr directly to the resultSequence.
  • Reverse and Return:

    • Since the sequence is constructed backwards, reverse resultSequence to get the final SCS.
    • Return the correct sequence as the result.

This approach provides an efficient solution to finding the shortest common supersequence by leveraging dynamic programming to break the problem down into simpler subproblems and building up the solution from there.

  • Java
java
class Solution {
    
    public String findShortestSupersequence(String first, String second) {
        int length1 = first.length();
        int length2 = second.length();
    
        int[][] dpTable = new int[length1 + 1][length2 + 1];
    
        // Base cases setup
        for (int i = 0; i <= length1; i++) {
            dpTable[i][0] = i;
        }
        for (int j = 0; j <= length2; j++) {
            dpTable[0][j] = j;
        }
    
        // Build the dpTable for supersequence lengths
        for (int i = 1; i <= length1; i++) {
            for (int j = 1; j <= length2; j++) {
                if (first.charAt(i - 1) == second.charAt(j - 1)) {
                    dpTable[i][j] = dpTable[i - 1][j - 1] + 1;
                } else {
                    dpTable[i][j] = Math.min(dpTable[i - 1][j], dpTable[i][j - 1]) + 1;
                }
            }
        }
    
        StringBuilder supersequenceBuilder = new StringBuilder();
        int i = length1, j = length2;
    
        // Construct the supersequence
        while (i > 0 && j > 0) {
            if (first.charAt(i - 1) == second.charAt(j - 1)) {
                supersequenceBuilder.append(first.charAt(i - 1));
                i--;
                j--;
            } else if (dpTable[i - 1][j] < dpTable[i][j - 1]) {
                supersequenceBuilder.append(first.charAt(i - 1));
                i--;
            } else {
                supersequenceBuilder.append(second.charAt(j - 1));
                j--;
            }
        }
    
        // Add remaining characters from first string
        while (i > 0) {
            supersequenceBuilder.append(first.charAt(i - 1));
            i--;
        }
        // Add remaining characters from second string
        while (j > 0) {
            supersequenceBuilder.append(second.charAt(j - 1));
            j--;
        }
    
        // Reverse StringBuilder content to get correct sequence
        return supersequenceBuilder.reverse().toString();
    }
}

The given Java program solves the problem of finding the shortest common supersequence of two strings. The overall strategy leverages dynamic programming to build a solution. Here’s a breakdown of the provided code:

  • Initialization:

    • An integer matrix dpTable is created with dimensions based on the lengths of the two input strings, first and second. This table is used to store the lengths of the shortest common supersequences at different substrings.
  • Base Case Setting:

    • The first row and first column of dpTable are filled with increasing integer values starting from 0, representing the scenario where one of the strings is empty.
  • DP Table Buildup:

    • The matrix is populated using a nested loop structure. For each pair (i, j) where i is an index into first and j is an index into second, it checks if characters at these positions match. If they match, the cell dpTable[i][j] is set to the value of the diagonal cell dpTable[i-1][j-1] plus one. If they don't match, it is set to the minimum of the cell to the left or above plus one.
  • Supersequence Construction:

    • Starting from the last position of the dpTable, the supersequence is constructed using a StringBuilder. Based on the values in the dpTable, either characters from first, second, or both are appended to the builder. If the paths lead up or left in the dp matrix, the corresponding character from first or second is appended.
  • Handling Remaining Characters:

    • After the primary loop completes, any remaining characters from either the first or second string (indicated by non-zero indices i or j) are appended to the builder.
  • Final Step:

    • The contents of the StringBuilder are reversed before being returned to ensure the sequence is in the correct order.

This approach efficiently compiles the shortest common supersequence, optimizing decisions recursively through matrix calculations, thus ensuring an optimal solution.

  • Python
python
class SupersequenceSolver:
    def findSCS(self, string1: str, string2: str) -> str:
        len1 = len(string1)
        len2 = len(string2)
    
        dp = [[0] * (len2 + 1) for _ in range(len1 + 1)]
    
        for i in range(len1 + 1):
            dp[i][0] = i
        for j in range(len2 + 1):
            dp[0][j] = j
    
        for i in range(1, len1 + 1):
            for j in range(1, len2 + 1):
                if string1[i - 1] == string2[j - 1]:
                    dp[i][j] = dp[i - 1][j - 1] + 1
                else:
                    dp[i][j] = min(dp[i - 1][j], dp[i][j - 1]) + 1
    
        scs = []
        i, j = len1, len2
    
        while i > 0 and j > 0:
            if string1[i - 1] == string2[j - 1]:
                scs.append(string1[i - 1])
                i -= 1
                j -= 1
            elif dp[i - 1][j] < dp[i][j - 1]:
                scs.append(string1[i - 1])
                i -= 1
            else:
                scs.append(string2[j - 1])
                j -= 1
    
        while i > 0:
            scs.append(string1[i - 1])
            i -= 1
        while j > 0:
            scs.append(string2[j - 1])
            j -= 1
    
        return ''.join(reversed(scs))

The provided Python solution implements a method to find the Shortest Common Supersequence (SCS) of two strings. This method efficiently determines the shortest sequence that contains both the given strings as subsequences.

Here’s a breakdown of how the solution functions:

  • Initialize a 2-dimensional dynamic programming array dp to store the lengths of the shortest supersequence at each subproblem (i, j), where i and j are the current indices in string1 and string2 respectively.

  • Populate the base cases in the dp array. When one of the strings is empty, the shortest supersequence length to that point is just the length of the other string.

  • Use a nested loop to fill in the dp table by comparing characters of string1 and string2. If they match, copy the diagonal value from dp (representing the continuation of both strings without adding length). If they don’t match, take the minimum value from either adjoining to the left (excluding a character from string1) or above (excluding a character from string2) and add one.

  • After constructing the dp table, trace back from dp[len1][len2] to the origin (0,0) to construct the supersequence. This is done by comparing the values in the dp table to decide whether to take a character from string1, string2, or both in case of a match.

  • Reverse the collected characters since the building of the supersequence starts from the end to the beginning of the strings.

Execute this method to generate the shortest sequence that embeds both input strings while retaining their original order. Use examples to validate the functionality and correctness of your implementation for practical understanding.

Comments

No comments yet.