
Problem Statement
In computational biology, gene sequences are made up of characters from the set {'A', 'C', 'G', 'T'}
. Each character represents a nucleotide, and a gene can be represented as a sequence of these nucleotides. In our problem, we deal with gene sequences that are always 8 characters long.
The goal is to determine the smallest number of mutations required to transform one given gene string (startGene
) into another (endGene
). A single mutation is defined as changing exactly one character in the gene string to a different character from the set {'A', 'C', 'G', 'T'}
.
However, not all potential mutations are valid. Valid mutations are explicitly listed in a given bank
array — a mutation to a gene sequence not present in this bank
is not allowed. If no sequence of valid mutations can convert the startGene
into the endGene
, the function should return -1
.
An important aspect of the problem is that while the starting gene sequence must be valid to consider, it doesn't necessarily have to be included in the bank
. Each sequence (whether startGene
or sequences in the bank
) is precisely 8 characters long, consisting only of the characters {'A', 'C', 'G', 'T'}
.
Examples
Example 1
Input:
startGene = "AACCGGTT", endGene = "AACCGGTA", bank = ["AACCGGTA"]
Output:
1
Example 2
Input:
startGene = "AACCGGTT", endGene = "AAACGGTA", bank = ["AACCGGTA","AACCGCTA","AAACGGTA"]
Output:
2
Constraints
0 <= bank.length <= 10
startGene.length == endGene.length == bank[i].length == 8
startGene
,endGene
, andbank[i]
consist of only the characters['A', 'C', 'G', 'T']
.
Approach and Intuition
The mutation problem can be visualized as a graph traversal problem where:
- Each gene string in the
bank
(including potentially thestartGene
andendGene
if they are in thebank
) represents a node in the graph. - An edge exists between two gene strings if they differ by exactly one mutation.
The objective is to find the shortest path from the startGene
to the endGene
within this graph. Here’s a step-by-step breakdown of the approach:
Graph Construction:
- Start by adding
startGene
andendGene
to the graph nodes if they are part of thebank
. - Generate edges between all pairs of nodes where the nodes differ by exactly one mutation.
- Start by adding
Breadth-First Search (BFS):
- Use BFS to find the shortest path from the
startGene
to theendGene
. BFS is suitable because it explores nodes layer by layer, ensuring that the first time it reachesendGene
, it has found the shortest path. - Begin the BFS from the
startGene
, and at each step, attempt all possible valid mutations that are one character different and exist within thebank
. - Track the visited nodes to prevent cycles and redundant work.
- Use BFS to find the shortest path from the
Early Exit:
- If you can mutate
startGene
directly toendGene
in the bank, return 1. - If the BFS completes without reaching the
endGene
, return-1
as it indicates there's no valid path of mutations.
- If you can mutate
This method leverages BFS for its natural applicability to shortest-path problems in unweighted graphs. By navigating through the explicitly allowed mutations (edges), the algorithm efficiently finds or refutes the possibility of transforming startGene
into endGene
through legal mutations.
Solutions
- C++
- Java
- Python
class Solution {
public:
int leastMutations(string startSequence, string endSequence, vector<string>& geneBank) {
queue<string> planQueue;
unordered_set<string> visited;
planQueue.push(startSequence);
visited.insert(startSequence);
int mutationCount = 0;
while (!planQueue.empty()) {
int levelSize = planQueue.size();
for (int level = 0; level < levelSize; level++) {
string current = planQueue.front();
planQueue.pop();
if (current == endSequence) {
return mutationCount;
}
for (char gene : "ACGT") {
for (int pos = 0; pos < current.size(); pos++) {
string mutated = current;
mutated[pos] = gene;
if (!visited.count(mutated) && find(geneBank.begin(), geneBank.end(), mutated) != geneBank.end()) {
planQueue.push(mutated);
visited.insert(mutated);
}
}
}
}
mutationCount++;
}
return -1;
}
};
This solution outlines the approach to find the minimum number of mutations needed to transform a genetic start sequence into a target end sequence using a set of allowed mutations found within a gene bank.
- Begin by initializing a queue to manage the sequences that need to be processed and a set to track visited sequences.
- Enqueue the start sequence, and also add it to the visited list.
- Initiate a count (
mutationCount
) to keep track of the number of mutation steps taken. - Start processing the queue until it's empty, representing the breadth-first search layers:
- For each sequence in the current level of the queue:
- Remove it from the queue.
- If this sequence matches the end sequence, return the mutation count as the result.
- For each possible gene substitution ('A', 'C', 'G', 'T'):
- Generate and assess each possible single mutation of the current sequence.
- If a newly formed sequence is found in the gene bank and hasn't been visited:
- Add it to the queue and mark it as visited to avoid reprocessing.
- For each sequence in the current level of the queue:
- After checking all possible mutations for the current sequence, increment the mutation count.
- If the queue empties and no solution is found, return -1 indicating the transformation isn't possible with given constraints.
This approach ensures that the solution checks all possible minimal mutations efficiently using BFS, tracking only viable sequences, and terminates when either a solution is found or all options are exhausted.
class Solution {
public int minimumMutation(String initial, String target, String[] geneticBank) {
Queue<String> processingQueue = new LinkedList<>();
Set<String> visited = new HashSet<>();
processingQueue.add(initial);
visited.add(initial);
int mutationSteps = 0;
while (!processingQueue.isEmpty()) {
int levelSize = processingQueue.size();
for (int j = 0; j < levelSize; j++) {
String current = processingQueue.poll();
if (current.equals(target)) {
return mutationSteps;
}
for (char gene : new char[] {'A', 'C', 'G', 'T'}) {
for (int i = 0; i < current.length(); i++) {
String mutated = current.substring(0, i) + gene + current.substring(i + 1);
if (!visited.contains(mutated) && Arrays.asList(geneticBank).contains(mutated)) {
processingQueue.add(mutated);
visited.add(mutated);
}
}
}
}
mutationSteps++;
}
return -1;
}
}
In solving the "Minimum Genetic Mutation" problem, the Java implementation uses a breadth-first search (BFS) approach to determine the minimum number of mutations needed to transform the initial genetic sequence into the target sequence using a specified set of valid genetic mutations available in a genetic bank.
- The function
minimumMutation
accepts three parameters: the initial sequence (initial
), the target sequence (target
), and an array of allowable genetic sequences (geneticBank
). - Use a
Queue
to manage the sequences that need processing. Initiate this queue with the starting sequence. - Maintain a
Set
to record sequences that have been visited to avoid redundant processing. - Begin processing sequences in a loop until no more sequences are left to process:
- If the current sequence matches the target, immediately return the number of mutation steps taken so far.
- For each character position in the sequence, attempt to mutate it to each of the possible genes ('A', 'C', 'G', 'T').
- Construct the mutated sequence and verify if it hasn't been visited yet and is present in the genetic bank.
- If valid, add this mutated sequence to the queue and the visited set.
- Increment the mutation steps counter after processing each level of the BFS.
- If the queue is exhausted without matching the target sequence, return -1 indicating the transformation isn't possible with the given genetic bank.
This method efficiently checks each possibility while ensuring no sequence is processed more than once, optimizing both time and space complexities.
class Solution:
def minimumMutation(self, initial: str, target: str, valid_mutations: List[str]) -> int:
mutation_queue = deque([(initial, 0)])
visited = {initial}
while mutation_queue:
current_strand, mutations_count = mutation_queue.popleft()
if current_strand == target:
return mutations_count
for char in "ACGT":
for index in range(len(current_strand)):
mutated = current_strand[:index] + char + current_strand[index + 1:]
if mutated not in visited and mutated in valid_mutations:
mutation_queue.append((mutated, mutations_count + 1))
visited.add(mutated)
return -1
This Python solution addresses the problem of finding the minimum number of mutations needed to transform a genetic sequence (initial
) into a target sequence (target
) using only valid mutations provided in a list (valid_mutations
). The approach utilizes Breadth-First Search (BFS) to ensure the shortest path (minimum mutations) is found.
Follow the detailed breakdown of the process:
- Initialize a queue with the initial genetic sequence and a counter for mutations set to 0.
- Use a set to record visited sequences to prevent revisiting and redundant calculations.
- Employ a
while
loop to process each item in the queue:- Dequeue to get the current genetic string and its mutation count.
- If the current genetic string matches the target, return the mutation count as the result.
- Generate all possible mutations of the current string by changing each character to 'A', 'C', 'G', or 'T'.
- Check if the newly generated mutation is valid and not visited:
- If valid, enqueue it with the mutation count incremented by one and mark it as visited.
- If the queue is exhausted and no solution is found, return -1, indicating that transformation to the target sequence is not possible with the given valid mutations.
This method is efficient in finding the shortest transformation path by exploring all possible mutations level by level until the target sequence is reached or no valid paths are available.
No comments yet.