433. Minimum Genetic Mutation - Solutions and Explanation

Problem Statement

In computational biology, gene sequences are made up of characters from the set {'A', 'C', 'G', 'T'}. Each character represents a nucleotide, and a gene can be represented as a sequence of these nucleotides. In our problem, we deal with gene sequences that are always 8 characters long.

The goal is to determine the smallest number of mutations required to transform one given gene string (startGene) into another (endGene). A single mutation is defined as changing exactly one character in the gene string to a different character from the set {'A', 'C', 'G', 'T'}.

However, not all potential mutations are valid. Valid mutations are explicitly listed in a given bank array — a mutation to a gene sequence not present in this bank is not allowed. If no sequence of valid mutations can convert the startGene into the endGene, the function should return -1.

An important aspect of the problem is that while the starting gene sequence must be valid to consider, it doesn't necessarily have to be included in the bank. Each sequence (whether startGene or sequences in the bank) is precisely 8 characters long, consisting only of the characters {'A', 'C', 'G', 'T'}.

Examples

Example 1

Input:

startGene = "AACCGGTT", endGene = "AACCGGTA", bank = ["AACCGGTA"]

Output:

Example 2

Input:

startGene = "AACCGGTT", endGene = "AAACGGTA", bank = ["AACCGGTA","AACCGCTA","AAACGGTA"]

Output:

Constraints

0 <= bank.length <= 10
startGene.length == endGene.length == bank[i].length == 8
startGene, endGene, and bank[i] consist of only the characters ['A', 'C', 'G', 'T'].

Approach and Intuition

The mutation problem can be visualized as a graph traversal problem where:

Each gene string in the bank (including potentially the startGene and endGene if they are in the bank) represents a node in the graph.
An edge exists between two gene strings if they differ by exactly one mutation.

The objective is to find the shortest path from the startGene to the endGene within this graph. Here’s a step-by-step breakdown of the approach:

Graph Construction:
- Start by adding startGene and endGene to the graph nodes if they are part of the bank.
- Generate edges between all pairs of nodes where the nodes differ by exactly one mutation.
Breadth-First Search (BFS):
- Use BFS to find the shortest path from the startGene to the endGene. BFS is suitable because it explores nodes layer by layer, ensuring that the first time it reaches endGene, it has found the shortest path.
- Begin the BFS from the startGene, and at each step, attempt all possible valid mutations that are one character different and exist within the bank.
- Track the visited nodes to prevent cycles and redundant work.
Early Exit:
- If you can mutate startGene directly to endGene in the bank, return 1.
- If the BFS completes without reaching the endGene, return -1 as it indicates there's no valid path of mutations.

This method leverages BFS for its natural applicability to shortest-path problems in unweighted graphs. By navigating through the explicitly allowed mutations (edges), the algorithm efficiently finds or refutes the possibility of transforming startGene into endGene through legal mutations.

Solutions

C++
Java
Python

                            cpp
                            
                        
class Solution {
public:
    int leastMutations(string startSequence, string endSequence, vector<string>& geneBank) {
        queue<string> planQueue;
        unordered_set<string> visited;
        planQueue.push(startSequence);
        visited.insert(startSequence);
            
        int mutationCount = 0;
        while (!planQueue.empty()) {
            int levelSize = planQueue.size();
                
            for (int level = 0; level < levelSize; level++) {
                string current = planQueue.front();
                planQueue.pop();
    
                if (current == endSequence) {
                    return mutationCount;
                }
                    
                for (char gene : "ACGT") {
                    for (int pos = 0; pos < current.size(); pos++) {
                        string mutated = current;
                        mutated[pos] = gene;
                        if (!visited.count(mutated) && find(geneBank.begin(), geneBank.end(), mutated) != geneBank.end()) {
                            planQueue.push(mutated);
                            visited.insert(mutated);
                        }
                    }
                }
            }
                
            mutationCount++;
        }
            
        return -1;
    }
};

This solution outlines the approach to find the minimum number of mutations needed to transform a genetic start sequence into a target end sequence using a set of allowed mutations found within a gene bank.

Begin by initializing a queue to manage the sequences that need to be processed and a set to track visited sequences.
Enqueue the start sequence, and also add it to the visited list.
Initiate a count (mutationCount) to keep track of the number of mutation steps taken.
Start processing the queue until it's empty, representing the breadth-first search layers:
- For each sequence in the current level of the queue:
  - Remove it from the queue.
  - If this sequence matches the end sequence, return the mutation count as the result.
  - For each possible gene substitution ('A', 'C', 'G', 'T'):
    - Generate and assess each possible single mutation of the current sequence.
    - If a newly formed sequence is found in the gene bank and hasn't been visited:
      - Add it to the queue and mark it as visited to avoid reprocessing.
After checking all possible mutations for the current sequence, increment the mutation count.
If the queue empties and no solution is found, return -1 indicating the transformation isn't possible with given constraints.

This approach ensures that the solution checks all possible minimal mutations efficiently using BFS, tracking only viable sequences, and terminates when either a solution is found or all options are exhausted.

                            java
                            
                        
class Solution {
    public int minimumMutation(String initial, String target, String[] geneticBank) {
        Queue<String> processingQueue = new LinkedList<>();
        Set<String> visited = new HashSet<>();
        processingQueue.add(initial);
        visited.add(initial);
    
        int mutationSteps = 0;
    
        while (!processingQueue.isEmpty()) {
            int levelSize = processingQueue.size();
            for (int j = 0; j < levelSize; j++) {
                String current = processingQueue.poll();
                if (current.equals(target)) {
                    return mutationSteps;
                }
    
                for (char gene : new char[] {'A', 'C', 'G', 'T'}) {
                    for (int i = 0; i < current.length(); i++) {
                        String mutated = current.substring(0, i) + gene + current.substring(i + 1);
                        if (!visited.contains(mutated) && Arrays.asList(geneticBank).contains(mutated)) {
                            processingQueue.add(mutated);
                            visited.add(mutated);
                        }
                    }
                }
            }
    
            mutationSteps++;
        }
    
        return -1;
    }
}

In solving the "Minimum Genetic Mutation" problem, the Java implementation uses a breadth-first search (BFS) approach to determine the minimum number of mutations needed to transform the initial genetic sequence into the target sequence using a specified set of valid genetic mutations available in a genetic bank.

The function minimumMutation accepts three parameters: the initial sequence (initial), the target sequence (target), and an array of allowable genetic sequences (geneticBank).
Use a Queue to manage the sequences that need processing. Initiate this queue with the starting sequence.
Maintain a Set to record sequences that have been visited to avoid redundant processing.
Begin processing sequences in a loop until no more sequences are left to process:
- If the current sequence matches the target, immediately return the number of mutation steps taken so far.
- For each character position in the sequence, attempt to mutate it to each of the possible genes ('A', 'C', 'G', 'T').
- Construct the mutated sequence and verify if it hasn't been visited yet and is present in the genetic bank.
- If valid, add this mutated sequence to the queue and the visited set.
Increment the mutation steps counter after processing each level of the BFS.
If the queue is exhausted without matching the target sequence, return -1 indicating the transformation isn't possible with the given genetic bank.

This method efficiently checks each possibility while ensuring no sequence is processed more than once, optimizing both time and space complexities.

                            python
                            
                        
class Solution:
    def minimumMutation(self, initial: str, target: str, valid_mutations: List[str]) -> int:
        mutation_queue = deque([(initial, 0)])
        visited = {initial}
            
        while mutation_queue:
            current_strand, mutations_count = mutation_queue.popleft()
            if current_strand == target:
                return mutations_count
    
            for char in "ACGT":
                for index in range(len(current_strand)):
                    mutated = current_strand[:index] + char + current_strand[index + 1:]
                    if mutated not in visited and mutated in valid_mutations:
                        mutation_queue.append((mutated, mutations_count + 1))
                        visited.add(mutated)
    
        return -1

This Python solution addresses the problem of finding the minimum number of mutations needed to transform a genetic sequence (initial) into a target sequence (target) using only valid mutations provided in a list (valid_mutations). The approach utilizes Breadth-First Search (BFS) to ensure the shortest path (minimum mutations) is found.

Follow the detailed breakdown of the process:

Initialize a queue with the initial genetic sequence and a counter for mutations set to 0.
Use a set to record visited sequences to prevent revisiting and redundant calculations.
Employ a while loop to process each item in the queue:
1. Dequeue to get the current genetic string and its mutation count.
2. If the current genetic string matches the target, return the mutation count as the result.
3. Generate all possible mutations of the current string by changing each character to 'A', 'C', 'G', or 'T'.
4. Check if the newly generated mutation is valid and not visited:
  - If valid, enqueue it with the mutation count incremented by one and mark it as visited.
If the queue is exhausted and no solution is found, return -1, indicating that transformation to the target sequence is not possible with the given valid mutations.

This method is efficient in finding the shortest transformation path by exploring all possible mutations level by level until the target sequence is reached or no valid paths are available.

Comments

No comments yet.

Minimum Genetic Mutation

Problem Statement

Examples

Example 1

Example 2

Constraints

Approach and Intuition

Solutions

Comments

Products

Features

Solutions

Marketplace

Resources

Company