Longest Common Subsequence

Updated on 04 June, 2025
Longest Common Subsequence header image

Problem Statement

The task is to determine the length of the longest subsequence that can be found identically in two given strings, text1 and text2. If there is no such subsequence that both strings share, the result should be 0. A subsequence is a sequence that can be derived from another sequence by deleting some or no elements without changing the order of the remaining elements. In context, for any two strings, this problem seeks to identify the longest sequence of characters that appear left-to-right (but not necessarily consecutively) in both strings and measure its length.

Examples

Example 1

Input:

text1 = "abcde", text2 = "ace"

Output:

3

Explanation:

The longest common subsequence is "ace" and its length is 3.

Example 2

Input:

text1 = "abc", text2 = "abc"

Output:

3

Explanation:

The longest common subsequence is "abc" and its length is 3.

Example 3

Input:

text1 = "abc", text2 = "def"

Output:

0

Explanation:

There is no such common subsequence, so the result is 0.

Constraints

  • 1 <= text1.length, text2.length <= 1000
  • text1 and text2 consist of only lowercase English characters.

Approach and Intuition

To solve this problem and understand the involved steps, let's break down the given examples and constraints.

  1. Consider the dynamic programming approach, which uses a 2D array dp where dp[i][j] represents the length of the longest common subsequence of text1[0...i-1] and text2[0...j-1].

  2. We incrementally build this dp array by comparing characters from both strings:

    • If text1[i-1] == text2[j-1]: It means the current characters match, hence dp[i][j] can be obtained by taking the value from dp[i-1][j-1] and adding one.
    • If text1[i-1] != text2[j-1]: Here, the characters do not match, so we take the maximum of either excluding the current character from text1 or text2, i.e., dp[i][j] = max(dp[i-1][j], dp[i][j-1]).
  3. Initialize the dp array with base cases where:

    • dp[i][0] and dp[0][j] are zero because a zero length of either string will have no common subsequence with the other.
  4. Finally, the solution to our problem, the length of the longest common subsequence, will be stored in dp[text1.length()][text2.length()].

Consider these important notes from the examples:

  • Example 1:

    • Inputs: text1 = "abcde", text2 = "ace"
    • A clear point-by-point comparison leads us to find "ace" as the longest common subsequence, which has a length of 3.
  • Example 2:

    • Inputs: text1 = "abc", text2 = "abc"
    • Both strings are identical, making the whole string "abc" the longest common subsequence with a length of 3.
  • Example 3:

    • Inputs: text1 = "abc", text2 = "def"
    • There are no common characters between text1 and text2, resulting in a longest common subsequence length of 0.

Constraint Insights:

  • The maximum length for text1 and text2 is 1000, which implies our approach needs to efficiently handle large inputs.
  • Both strings are composed of lowercase English letters, simplifying the problem as we deal with a known and limited character set.

Solutions

  • Java
  • Python
java
class Solution {

  public int longestCommonSubsequence(String s1, String s2) {
  
    // Swap strings if s2 is shorter than s1
    if (s2.length() < s1.length()) {
      String tempStr = s1;
      s1 = s2;
      s2 = tempStr;
    }
    
    // Use dynamic programming tables for LCS calculation
    int[] oldRow = new int[s1.length() + 1];
    int[] newRow = new int[s1.length() + 1];
    
    // Process each character from the end to the beginning
    for (int i = s2.length() - 1; i >= 0; i--) {
      for (int j = s1.length() - 1; j >= 0; j--) {
        if (s1.charAt(j) == s2.charAt(i)) {
          newRow[j] = 1 + oldRow[j + 1];
        } else {
          newRow[j] = Math.max(oldRow[j], newRow[j + 1]);
        }
      }
      // Swap oldRow and newRow for the next iteration
      int[] temp = oldRow;
      oldRow = newRow;
      newRow = temp;
    }
    
    // Return the result from the computed LCS length
    return oldRow[0];
  }
}

Implement the "Longest Common Subsequence" algorithm in Java using Dynamic Programming. This approach facilitates efficient LCS calculation by considering all subsequences of two given strings to determine the longest common substrate. Essentially, the solution exploits a tabular framework to store the length of the longest subsequence that can be obtained from the input strings. Here's a streamlined overview of the solution's execution:

  1. If the second string is shorter than the first, swap the strings to optimize space usage because only one-dimensional arrays are used to store the previous and current lengths of LCS.
  2. Create two arrays, oldRow and newRow, initialized to the length of the shorter string plus one. These arrays are used to manage the current and previous states of LCS calculations.
  3. Iterate from the last character to the first character of the longer string (s2) and simultaneously for the shorter string (s1). For each character pair comparison:
    • If the characters match, update the newRow to 1 + oldRow[j+1], which represents extending a subsequence.
    • If the characters do not match, update the newRow with the maximum value derived either from oldRow[j] or newRow[j+1], representing skipping a character in one of the strings to find a better match.
  4. After each complete pass over the characters of s1, swap newRow and oldRow to prepare for the next character in s2. This swap allows for reusing the arrays without allocating additional space.
  5. After finishing all iterations, oldRow[0] contains the length of the longest common subsequence.

This algorithm efficiently reduces space complexity to O(min(m, n)) and time complexity to O(m*n), where m and n are the lengths of the two strings. This optimized approach is suited for situations where string lengths are considerably large, ensuring both performance and resource utilization are balanced.

python
class LCSCalculator:
    def computeLCS(self, str1: str, str2: str) -> int:
        if len(str1) > len(str2):
            str1, str2 = str2, str1

        prior = [0] * (len(str1) + 1)
        curr = [0] * (len(str1) + 1)

        for i in reversed(range(len(str2))):
            for j in reversed(range(len(str1))):
                if str2[i] == str1[j]:
                    curr[j] = 1 + prior[j + 1]
                else:
                    curr[j] = max(prior[j], curr[j + 1])
            prior, curr = curr, prior

        return prior[0]

The provided Python code defines a class LCSCalculator that computes the length of the longest common subsequence (LCS) between two strings. The algorithm utilizes dynamic programming to efficiently determine the LCS length. The method computeLCS inside LCSCalculator takes two string inputs, str1 and str2. The implementation leverages a space optimization technique by using two lists (prior and curr) to keep track of the LCS lengths in a bottom-up manner.

  • First, the code checks and arranges str1 and str2 such that str1 is always the shorter string. This is done to reduce the memory footprint since only the length of str1 plus one is needed for creating the lists.

  • Two lists, prior and curr, are initialized with zeros, where their length is one more than that of the shorter string. These lists serve to store intermediate LCS values.

  • The main computation is performed inside a nested loop structure, where the outer loop iterates through the characters of str2 in reverse, and the inner loop does the same for str1.

  • For each character comparison:

    • When characters from str1 and str2 match at certain indices, the value at the current position in curr is updated to be 1 plus the diagonal value in prior.
    • If there's no match, it picks the maximum value between the current indexed value in prior and the next indexed value in curr.
  • After each inner loop iteration, the roles of prior and curr are swapped; curr takes the role of prior for the next iteration of computations, which avoids reinitializing the array for each new row of calculations.

  • Upon completion of loops, prior[0] contains the length of the LCS.

This approach ensures the LCS is determined in O(n*m) time complexity, where 'n' and 'm' are the lengths of str1 and str2, and uses O(min(n,m)) space. This is both time and space efficient, particularly useful for comparing large strings where classic methods might become computationally expensive.

Comments

No comments yet.