
Problem Statement
The task is to identify and return all duplicate subtrees from a given binary tree. A subtree is considered duplicate if there exists at least another subtree that has the exact same structure and node values as it. For each type of duplicate subtree found, only the root node of one instance should be returned. This allows us to pinpoint locations in the larger tree where repetitions occur, simplifying the identification of recurring structures and data patterns.
Examples
Example 1
Input:
root = [1,2,3,4,null,2,4,null,null,4]
Output:
[[2,4],[4]]
Example 2
Input:
root = [2,1,1]
Output:
[[1]]
Example 3
Input:
root = [2,2,2,3,null,3,null]
Output:
[[2,3],[3]]
Constraints
- The number of the nodes in the tree will be in the range
[1, 5000]
-200 <= Node.val <= 200
Approach and Intuition
The challenge is to efficiently identify structures within the tree that recur in other parts.
By traversing the tree and generating a unique fingerprint for the structure and values of each subtree, we can track which forms reappear.
The serializing of each subtree can be done using a preorder traversal, combining both the structure and the values into a string format. For example, a subtree with root node of value
2
and left child with value1
can be represented as "2,1".These serialized strings can serve as keys in a dictionary to count occurrences. When a serialized form of a subtree appears more than once, it means a duplicate exists.
Distinct from keeping every subtree data in a list, storing potential duplicates in the dictionary allows for constant time checks for duplicates which is efficient given the constraints:
- The smallest tree contains just one node, and the largest up to 5000 nodes.
- Node values range between -200 and 200, which are manageable for direct comparisons.
Example Explorations
- In the first example, the input tree
[1,2,3,4,null,2,4,null,null,4]
results in duplicates of the subtrees[2,4]
and[4]
. The tree contains multiple instances of these structures, leading to their identification as duplicates. - In the second example, the tree
[2,1,1]
simply has two nodes with the value1
as leaf nodes, thus the subtree[1]
is identified as a duplicate. - The third example,
[2,2,2,3,null,3,null]
, has three nodes with the value2
where two of them share the same subtree structure[2,3]
, indicating the duplication alongside the separate leaves[3]
.
This approach efficiently maps tree structures to their occurrences, enabling quick detection of repetitive patterns and facilitating solutions even when working with large binary trees with thousands of nodes.
Solutions
- C++
- Java
- Python
class Solution {
public:
vector<TreeNode*> getDuplicateSubtrees(TreeNode* root) {
unordered_map<string, int> serializedToId;
unordered_map<int, int> subtreeCount;
vector<TreeNode*> duplicates;
function<int(TreeNode*)> encode = [&serializedToId, &subtreeCount, &duplicates,
&encode](TreeNode* node) -> int {
if (node == nullptr) {
return 0;
}
string serialized = to_string(encode(node->left)) + "," + to_string(node->val) + "," +
to_string(encode(node->right));
if (serializedToId.find(serialized) == serializedToId.end()) {
serializedToId[serialized] = serializedToId.size() + 1;
}
int uid = serializedToId[serialized];
subtreeCount[uid]++;
if (subtreeCount[uid] == 2) {
duplicates.push_back(node);
}
return uid;
};
encode(root);
return duplicates;
}
};
This C++ solution addresses the problem of finding all duplicate subtrees in a binary tree. Each subtree is serialized into a string form to allow for comparison. Here's a breakdown of how the code functions:
- The function
getDuplicateSubtrees
receives the root node of a binary tree. - An unordered map,
serializedToId
, maps a serialized subtree to a unique identifier. - Another unordered map,
subtreeCount
, keeps track of how many times each unique serialized subtree appears. - A vector,
duplicates
, is used to store the root nodes of subtrees that are found to be duplicates. - The internal
encode
function defines the mechanism for recursive subtree serialization:- If the node is
nullptr
, it returns 0, indicating an empty subtree. - It recursively serializes the subtree using preorder traversal (node, left, right) and combines these into a string.
- If the serialized string is not already a key in
serializedToId
, it adds it with a new unique identifier. - It then checks if this particular serialized form has been encountered before by using the
subtreeCount
map. - If a serialized subtree is encountered twice, it is added to the
duplicates
vector. - Each call to
encode
returns a unique identifier corresponding to the serialized subtree of the node.
- If the node is
- The
encode
function is initially called with the root of the tree. - Finally, the
duplicates
vector, containing all the duplicate subtrees' root nodes, is returned.
By serializing subtrees and mapping these to unique identifiers, this solution efficiently identifies and collects roots of all duplicate subtrees in a binary tree. It leverages hashing (via unordered maps) to manage and compare subtree structures effectively.
class Solution {
public List<TreeNode> getDuplicateSubtrees(TreeNode root) {
List<TreeNode> duplicates = new LinkedList<>();
searchDuplicates(root, new HashMap<>(), new HashMap<>(), duplicates);
return duplicates;
}
public int searchDuplicates(TreeNode node, Map<String, Integer> subtreeMap,
Map<Integer, Integer> countMap, List<TreeNode> duplicates) {
if (node == null) {
return 0;
}
String serial = searchDuplicates(node.left, subtreeMap, countMap, duplicates) + "," + node.val +
"," + searchDuplicates(node.right, subtreeMap, countMap, duplicates);
if (!subtreeMap.containsKey(serial)) {
subtreeMap.put(serial, subtreeMap.size() + 1);
}
int subtreeId = subtreeMap.get(serial);
countMap.put(subtreeId, countMap.getOrDefault(subtreeId, 0) + 1);
if (countMap.get(subtreeId) == 2) {
duplicates.add(node);
}
return subtreeId;
}
}
In this Java solution, the task is to find all duplicate subtrees in a given binary tree. Each subtree is serialized into a string format and stored alongside its occurrence count using hash maps. This method not only identifies duplicates efficiently but also stores references to the original subtree roots where the duplicates start.
The implementation involves two primary methods:
getDuplicateSubtrees(TreeNode root): This method initializes necessary data structures and triggers the recursive search through the binary tree to locate and collect duplicates.
searchDuplicates(TreeNode node, Map
subtreeMap, Map : This recursive function:countMap, List<TreeNode> duplicates) - Serializes the current subtree rooted at
node
. - Tracks the serialization in
subtreeMap
to ensure each unique subtree gets a unique ID. - Records the frequency of each subtree ID in
countMap
. - Checks the count of each serialization; if a serialization appears exactly twice, it adds the root node of that subtree to the duplicates list. This ensures each duplicate subtree is only added once.
- Finally, returns the unique subtree ID, facilitating serialization of parent trees.
- Serializes the current subtree rooted at
The solution efficiently catalogs and checks for duplicates using serialization and mapping strategies, ensuring that duplicates are identified based on structural equality rather than object references. This approach provides a robust and scalable method to solve the problem of finding duplicate subtrees in binary trees.
class Solution:
def findRepeatedSubtrees(self, root):
def dfs(node):
if not node:
return "#"
sub_tree = (dfs(node.left), node.val, dfs(node.right))
if sub_tree not in tree_map:
tree_map[sub_tree] = len(tree_map) + 1
subtree_id = tree_map[sub_tree]
counter[subtree_id] += 1
if counter[subtree_id] == 2:
result.append(node)
return subtree_id
tree_map = {}
counter = collections.defaultdict(int)
result = []
dfs(root)
return result
The provided Python script defines a method to find duplicate subtrees within a binary tree. Each subtree that appears more than once with the same structure and node values is considered duplicate. The solution employs a depth-first search (DFS) to examine each node's subtree and utilizes a tuple to represent the structure and values of these subtrees. Note the use of serialization techniques where the subtrees are converted into a serialized string format represented by tuples.
The following explains the critical components and steps of the algorithm:
- Define a helper function
dfs()
which is a recursive function to traverse the tree in a depth-first manner. - For each node encountered, it identifies a unique signature (a tuple) that represents the subtree structure rooted at that node. This tuple includes the node's value and the identifiers of the left and right subtrees.
- Use a dictionary
tree_map
to map each unique subtree structure to a unique identifier. - Employ another dictionary
counter
to count the occurrences of each subtree identifier. - If a subtree's count in
counter
reaches two — indicating the subtree has been seen once before — it adds the subtree's root node to the result listresult
. - Finally, return the
result
list, which contains the root nodes of all duplicate subtrees.
This approach ensures that each subtree is only recorded once when it first becomes a duplicate, optimizing the process and avoiding redundancy.
No comments yet.