
Problem Statement
The task is to identify and return all duplicate subtrees from a given binary tree. A subtree is considered duplicate if there exists at least another subtree that has the exact same structure and node values as it. For each type of duplicate subtree found, only the root node of one instance should be returned. This allows us to pinpoint locations in the larger tree where repetitions occur, simplifying the identification of recurring structures and data patterns.
Examples
Example 1
Input:
root = [1,2,3,4,null,2,4,null,null,4]
Output:
[[2,4],[4]]
Example 2
Input:
root = [2,1,1]
Output:
[[1]]
Example 3
Input:
root = [2,2,2,3,null,3,null]
Output:
[[2,3],[3]]
Constraints
- The number of the nodes in the tree will be in the range 
[1, 5000] -200 <= Node.val <= 200
Approach and Intuition
The challenge is to efficiently identify structures within the tree that recur in other parts.
By traversing the tree and generating a unique fingerprint for the structure and values of each subtree, we can track which forms reappear.
The serializing of each subtree can be done using a preorder traversal, combining both the structure and the values into a string format. For example, a subtree with root node of value
2and left child with value1can be represented as "2,1".These serialized strings can serve as keys in a dictionary to count occurrences. When a serialized form of a subtree appears more than once, it means a duplicate exists.
Distinct from keeping every subtree data in a list, storing potential duplicates in the dictionary allows for constant time checks for duplicates which is efficient given the constraints:
- The smallest tree contains just one node, and the largest up to 5000 nodes.
 - Node values range between -200 and 200, which are manageable for direct comparisons.
 
Example Explorations
- In the first example, the input tree 
[1,2,3,4,null,2,4,null,null,4]results in duplicates of the subtrees[2,4]and[4]. The tree contains multiple instances of these structures, leading to their identification as duplicates. - In the second example, the tree 
[2,1,1]simply has two nodes with the value1as leaf nodes, thus the subtree[1]is identified as a duplicate. - The third example, 
[2,2,2,3,null,3,null], has three nodes with the value2where two of them share the same subtree structure[2,3], indicating the duplication alongside the separate leaves[3]. 
This approach efficiently maps tree structures to their occurrences, enabling quick detection of repetitive patterns and facilitating solutions even when working with large binary trees with thousands of nodes.
Solutions
- C++
 - Java
 - Python
 
class Solution {
public:
    vector<TreeNode*> getDuplicateSubtrees(TreeNode* root) {
        unordered_map<string, int> serializedToId;
        unordered_map<int, int> subtreeCount;
        vector<TreeNode*> duplicates;
        function<int(TreeNode*)> encode = [&serializedToId, &subtreeCount, &duplicates,
                                           &encode](TreeNode* node) -> int {
            if (node == nullptr) {
                return 0;
            }
            string serialized = to_string(encode(node->left)) + "," + to_string(node->val) + "," +
                                to_string(encode(node->right));
            if (serializedToId.find(serialized) == serializedToId.end()) {
                serializedToId[serialized] = serializedToId.size() + 1;
            }
            int uid = serializedToId[serialized];
            subtreeCount[uid]++;
            if (subtreeCount[uid] == 2) {
                duplicates.push_back(node);
            }
            return uid;
        };
        encode(root);
        return duplicates;
    }
};
This C++ solution addresses the problem of finding all duplicate subtrees in a binary tree. Each subtree is serialized into a string form to allow for comparison. Here's a breakdown of how the code functions:
- The function 
getDuplicateSubtreesreceives the root node of a binary tree. - An unordered map, 
serializedToId, maps a serialized subtree to a unique identifier. - Another unordered map, 
subtreeCount, keeps track of how many times each unique serialized subtree appears. - A vector, 
duplicates, is used to store the root nodes of subtrees that are found to be duplicates. - The internal 
encodefunction defines the mechanism for recursive subtree serialization:- If the node is 
nullptr, it returns 0, indicating an empty subtree. - It recursively serializes the subtree using preorder traversal (node, left, right) and combines these into a string.
 - If the serialized string is not already a key in 
serializedToId, it adds it with a new unique identifier. - It then checks if this particular serialized form has been encountered before by using the 
subtreeCountmap. - If a serialized subtree is encountered twice, it is added to the 
duplicatesvector. - Each call to 
encodereturns a unique identifier corresponding to the serialized subtree of the node. 
 - If the node is 
 - The 
encodefunction is initially called with the root of the tree. - Finally, the 
duplicatesvector, containing all the duplicate subtrees' root nodes, is returned. 
By serializing subtrees and mapping these to unique identifiers, this solution efficiently identifies and collects roots of all duplicate subtrees in a binary tree. It leverages hashing (via unordered maps) to manage and compare subtree structures effectively.
class Solution {
    public List<TreeNode> getDuplicateSubtrees(TreeNode root) {
        List<TreeNode> duplicates = new LinkedList<>();
        searchDuplicates(root, new HashMap<>(), new HashMap<>(), duplicates);
        return duplicates;
    }
    
    public int searchDuplicates(TreeNode node, Map<String, Integer> subtreeMap,
            Map<Integer, Integer> countMap, List<TreeNode> duplicates) {
        if (node == null) {
            return 0;
        }
        String serial = searchDuplicates(node.left, subtreeMap, countMap, duplicates) + "," + node.val +
                "," + searchDuplicates(node.right, subtreeMap, countMap, duplicates);
        if (!subtreeMap.containsKey(serial)) {
            subtreeMap.put(serial, subtreeMap.size() + 1);
        }
        int subtreeId = subtreeMap.get(serial);
        countMap.put(subtreeId, countMap.getOrDefault(subtreeId, 0) + 1);
        if (countMap.get(subtreeId) == 2) {
            duplicates.add(node);
        }
        return subtreeId;
    }
}
In this Java solution, the task is to find all duplicate subtrees in a given binary tree. Each subtree is serialized into a string format and stored alongside its occurrence count using hash maps. This method not only identifies duplicates efficiently but also stores references to the original subtree roots where the duplicates start.
The implementation involves two primary methods:
getDuplicateSubtrees(TreeNode root): This method initializes necessary data structures and triggers the recursive search through the binary tree to locate and collect duplicates.
searchDuplicates(TreeNode node, Map
subtreeMap, Map : This recursive function:countMap, List<TreeNode> duplicates) - Serializes the current subtree rooted at 
node. - Tracks the serialization in 
subtreeMapto ensure each unique subtree gets a unique ID. - Records the frequency of each subtree ID in 
countMap. - Checks the count of each serialization; if a serialization appears exactly twice, it adds the root node of that subtree to the duplicates list. This ensures each duplicate subtree is only added once.
 - Finally, returns the unique subtree ID, facilitating serialization of parent trees.
 
- Serializes the current subtree rooted at 
 
The solution efficiently catalogs and checks for duplicates using serialization and mapping strategies, ensuring that duplicates are identified based on structural equality rather than object references. This approach provides a robust and scalable method to solve the problem of finding duplicate subtrees in binary trees.
class Solution:
    def findRepeatedSubtrees(self, root):
        def dfs(node):
            if not node:
                return "#"
            sub_tree = (dfs(node.left), node.val, dfs(node.right))
            if sub_tree not in tree_map:
                tree_map[sub_tree] = len(tree_map) + 1
            subtree_id = tree_map[sub_tree]
            counter[subtree_id] += 1
            if counter[subtree_id] == 2:
                result.append(node)
            return subtree_id
    
        tree_map = {}
        counter = collections.defaultdict(int)
        result = []
        dfs(root)
        return result
The provided Python script defines a method to find duplicate subtrees within a binary tree. Each subtree that appears more than once with the same structure and node values is considered duplicate. The solution employs a depth-first search (DFS) to examine each node's subtree and utilizes a tuple to represent the structure and values of these subtrees. Note the use of serialization techniques where the subtrees are converted into a serialized string format represented by tuples.
The following explains the critical components and steps of the algorithm:
- Define a helper function 
dfs()which is a recursive function to traverse the tree in a depth-first manner. - For each node encountered, it identifies a unique signature (a tuple) that represents the subtree structure rooted at that node. This tuple includes the node's value and the identifiers of the left and right subtrees.
 - Use a dictionary 
tree_mapto map each unique subtree structure to a unique identifier. - Employ another dictionary 
counterto count the occurrences of each subtree identifier. - If a subtree's count in 
counterreaches two — indicating the subtree has been seen once before — it adds the subtree's root node to the result listresult. - Finally, return the 
resultlist, which contains the root nodes of all duplicate subtrees. 
This approach ensures that each subtree is only recorded once when it first becomes a duplicate, optimizing the process and avoiding redundancy.