Find the Shortest Superstring

Updated on 29 May, 2025
Find the Shortest Superstring header image

Problem Statement

In the given problem, we are provided with an array of unique strings named words. The objective is to form a new string that is the smallest by length but contains each string from the words array as a substring. If there exists more than one string that satisfies the smallest length criteria, returning any one of them is considered valid. Additionally, it’s provided that no string within the array is a substring of any other string in the array. This condition simplifies the problem slightly as we don't need to consider overlapping or contained strings when constructing our solution.

Examples

Example 1

Input:

words = ["alex","loves","vultr"]

Output:

"alexlovesvultr"

Explanation:

All permutations of "alex","loves","vultr" would also be accepted.

Example 2

Input:

words = ["catg","ctaagt","gcta","ttca","atgcatc"]

Output:

"gctaagttcatgcatc"

Constraints

  • 1 <= words.length <= 12
  • 1 <= words[i].length <= 20
  • words[i] consists of lowercase English letters.
  • All the strings of words are unique.

Approach and Intuition

The task involves constructing the shortest possible string that can contain all the given words as substrings. This can be interpreted as a variant of the shortest common supersequence problem or, more aptly, like assembling a puzzle where each piece (word) must appear once without truncation, aligning them in the best compact form.

Example Analysis

  • For the input ["alex","loves","vultr"], a straightforward solution might be to concatenate all strings since there are no overlaps specified. Thus, "alexlovesvultr" is an acceptable output.

  • For the input ["catg","ctaagt","gcta","ttca","atgcatc"], the optimally built string might leverage overlapping sequences. For example, "gcta" followed by "ctaagt" gives "gctaagt" preserving both strings entirely. This framework continues with the relations: "gctaagt" + "ttca" (sharing 't') ⇒ "gctaagttca" "gctaagttca" + "catg" (sharing 'catg') ⇒ "gctaagttcatg" Lastly, "gctaagttcatg" + "atgcatc" (sharing part of 'atgcatc') ⇒ "gctaagttcatgcatc", which is our required result.

Examples

Example 1

Input:

words = ["alex","loves","vultr"]

Output:

"alexlovesvultr"

Explanation:

All permutations of "alex","loves","vultr" would also be accepted.

Example 2

Input:

words = ["catg","ctaagt","gcta","ttca","atgcatc"]

Output:

"gctaagttcatgcatc"

Constraints

  • 1 <= words.length <= 12
  • 1 <= words[i].length <= 20
  • words[i] consists of lowercase English letters.
  • All the strings of words are unique.

Steps to Approach

  1. Start by constructing relationships between each word to find possible overlaps which can help minimize the total length when words are concatenated.

  2. Utilize dynamic programming or graph-based approaches like Hamiltonian paths to effectively compute the arrangement that minimizes the length. These computational strategies are ideal due to the potentially large permutations of concatenations with the input constraints.

  3. Implement checks for overlap between two strings, then merge as per the maximum overlap to reduce redundancy and hence, the resultant length.

  4. Due to the flexibility of multiple valid outputs, as the constraints allow an output of any minimal-length string, leverage this by cutting down on the exhaustive search where possible. If a sufficiently short string is built earlier on, exploring further permutations may be unnecessary.

The solving method pivots crucially on our ability to find overlaps and efficiently manage combinations which, while computationally intensive (especially towards the upper limits of constraints provided), can be managed with strategic algorithmic planning leveraging overlaps and string positioning.

Solutions

  • Java
java
class Solution {
    public String computeSuperstring(String[] words) {
        int wordCount = words.length;

        // Calculate overlaps for the words
        int[][] overlapMatrix = new int[wordCount][wordCount];
        for (int i = 0; i < wordCount; ++i)
            for (int j = 0; j < wordCount; ++j) if (i != j) {
                int minLength = Math.min(words[i].length(), words[j].length());
                for (int k = minLength; k >= 0; --k)
                    if (words[i].endsWith(words[j].substring(0, k))) {
                        overlapMatrix[i][j] = k;
                        break;
                    }
            }

        // Compute using dynamic programming approach
        int[][] dp = new int[1 << wordCount][wordCount];
        int[][] parent = new int[1 << wordCount][wordCount];
        for (int mask = 0; mask < (1 << wordCount); ++mask) {
            Arrays.fill(parent[mask], -1);

            for (int bit = 0; bit < wordCount; ++bit) if (((mask >> bit) & 1) > 0) {
                int prevMask = mask ^ (1 << bit);
                if (prevMask == 0) continue;
                for (int i = 0; i < wordCount; ++i) if (((prevMask >> i) & 1) > 0) {
                    int val = dp[prevMask][i] + overlapMatrix[i][bit];
                    if (val > dp[mask][bit]) {
                        dp[mask][bit] = val;
                        parent[mask][bit] = i;
                    }
                }
            }
        }

        // Reconstruct permutation and determine solution string
        int[] orderedIndices = new int[wordCount];
        boolean[] added = new boolean[wordCount];
        int currentIndex = 0;
        int currentMask = (1 << wordCount) - 1;
        int lastWordIndex = 0;

        // Start with the word that maximizes the overlap using the dp array
        for (int j = 0; j < wordCount; ++j)
            if (dp[(1 << wordCount) - 1][j] > dp[(1 << wordCount) - 1][lastWordIndex])
                lastWordIndex = j;

        // Build the optimal permutation by tracing back using 'parent'
        while (lastWordIndex != -1) {
            orderedIndices[currentIndex++] = lastWordIndex;
            added[lastWordIndex] = true;
            int nextWord = parent[currentMask][lastWordIndex];
            currentMask ^= 1 << lastWordIndex;
            lastWordIndex = nextWord;
        }

        // Reverse the ordering to match the correct order
        for (int i = 0; i < currentIndex / 2; ++i) {
            int temp = orderedIndices[i];
            orderedIndices[i] = orderedIndices[currentIndex - 1 - i];
            orderedIndices[currentIndex - 1 - i] = temp;
        }

        // Add words not included in dp max path
        for (int i = 0; i < wordCount; ++i) if (!added[i])
            orderedIndices[currentIndex++] = i;

        // Construct the superstring from the optimal permutation
        StringBuilder superstring = new StringBuilder(words[orderedIndices[0]]);
        for (int i = 1; i < wordCount; ++i) {
            int overlap = overlapMatrix[orderedIndices[i - 1]][orderedIndices[i]];
            superstring.append(words[orderedIndices[i]].substring(overlap));
        }

        return superstring.toString();
    }
}

This solution is designed to find the shortest superstring that can be formed from a given list of strings by overlapping them optimally. The code is implemented in Java and utilizes dynamic programming combined with bit manipulation for efficient computation.

  • Begin by initializing crucial data structures:
    • An overlap matrix that stores the maximum overlap between any two words.
    • A dynamic programming table dp to hold the intermediate results and a parent array to keep track of decisions for constructing the final string.
  • Compute the overlaps between all pairs of words to populate the overlap matrix.
  • Use bit masks within the dynamic programming loop to identify which words have been included in the current subset. Update the dp table by trying to add a new word to each existing subset.
  • Trace back from the full set (when all words are used) to construct the permutation that led to the optimal overlap using the parent table.
  • Finally, reconstruct the shortest superstring by concatenating the words according to the calculated permutation, making sure to add unconnected words and maintain the correct overlaps.

The main computational components involve calculating overlaps and updating the DP table effectively, while the result reconstruction ensures that the final string is correctly formed from the optimal path derived from the DP structure. Ensure that edge cases, such as non-overlapping words, are handled by directly appending them in the resulting superstring. This approach ensures the solution is both efficient and accurate for an otherwise computationally intensive problem.

Comments

No comments yet.