Find Duplicate Subtrees

Updated on 27 May, 2025
Find Duplicate Subtrees header image

Problem Statement

The task is to identify and return all duplicate subtrees from a given binary tree. A subtree is considered duplicate if there exists at least another subtree that has the exact same structure and node values as it. For each type of duplicate subtree found, only the root node of one instance should be returned. This allows us to pinpoint locations in the larger tree where repetitions occur, simplifying the identification of recurring structures and data patterns.

Examples

Example 1

Input:

root = [1,2,3,4,null,2,4,null,null,4]

Output:

[[2,4],[4]]

Example 2

Input:

root = [2,1,1]

Output:

[[1]]

Example 3

Input:

root = [2,2,2,3,null,3,null]

Output:

[[2,3],[3]]

Constraints

  • The number of the nodes in the tree will be in the range [1, 5000]
  • -200 <= Node.val <= 200

Approach and Intuition

  • The challenge is to efficiently identify structures within the tree that recur in other parts.

  • By traversing the tree and generating a unique fingerprint for the structure and values of each subtree, we can track which forms reappear.

  • The serializing of each subtree can be done using a preorder traversal, combining both the structure and the values into a string format. For example, a subtree with root node of value 2 and left child with value 1 can be represented as "2,1".

  • These serialized strings can serve as keys in a dictionary to count occurrences. When a serialized form of a subtree appears more than once, it means a duplicate exists.

  • Distinct from keeping every subtree data in a list, storing potential duplicates in the dictionary allows for constant time checks for duplicates which is efficient given the constraints:

    • The smallest tree contains just one node, and the largest up to 5000 nodes.
    • Node values range between -200 and 200, which are manageable for direct comparisons.

Example Explorations

  • In the first example, the input tree [1,2,3,4,null,2,4,null,null,4] results in duplicates of the subtrees [2,4] and [4]. The tree contains multiple instances of these structures, leading to their identification as duplicates.
  • In the second example, the tree [2,1,1] simply has two nodes with the value 1 as leaf nodes, thus the subtree [1] is identified as a duplicate.
  • The third example, [2,2,2,3,null,3,null], has three nodes with the value 2 where two of them share the same subtree structure [2,3], indicating the duplication alongside the separate leaves [3].

This approach efficiently maps tree structures to their occurrences, enabling quick detection of repetitive patterns and facilitating solutions even when working with large binary trees with thousands of nodes.

Solutions

  • C++
  • Java
  • Python
cpp
class Solution {
public:
    vector<TreeNode*> getDuplicateSubtrees(TreeNode* root) {
        unordered_map<string, int> serializedToId;
        unordered_map<int, int> subtreeCount;
        vector<TreeNode*> duplicates;
        function<int(TreeNode*)> encode = [&serializedToId, &subtreeCount, &duplicates,
                                           &encode](TreeNode* node) -> int {
            if (node == nullptr) {
                return 0;
            }
            string serialized = to_string(encode(node->left)) + "," + to_string(node->val) + "," +
                                to_string(encode(node->right));
            if (serializedToId.find(serialized) == serializedToId.end()) {
                serializedToId[serialized] = serializedToId.size() + 1;
            }
            int uid = serializedToId[serialized];
            subtreeCount[uid]++;
            if (subtreeCount[uid] == 2) {
                duplicates.push_back(node);
            }
            return uid;
        };
        encode(root);
        return duplicates;
    }
};

This C++ solution addresses the problem of finding all duplicate subtrees in a binary tree. Each subtree is serialized into a string form to allow for comparison. Here's a breakdown of how the code functions:

  1. The function getDuplicateSubtrees receives the root node of a binary tree.
  2. An unordered map, serializedToId, maps a serialized subtree to a unique identifier.
  3. Another unordered map, subtreeCount, keeps track of how many times each unique serialized subtree appears.
  4. A vector, duplicates, is used to store the root nodes of subtrees that are found to be duplicates.
  5. The internal encode function defines the mechanism for recursive subtree serialization:
    • If the node is nullptr, it returns 0, indicating an empty subtree.
    • It recursively serializes the subtree using preorder traversal (node, left, right) and combines these into a string.
    • If the serialized string is not already a key in serializedToId, it adds it with a new unique identifier.
    • It then checks if this particular serialized form has been encountered before by using the subtreeCount map.
    • If a serialized subtree is encountered twice, it is added to the duplicates vector.
    • Each call to encode returns a unique identifier corresponding to the serialized subtree of the node.
  6. The encode function is initially called with the root of the tree.
  7. Finally, the duplicates vector, containing all the duplicate subtrees' root nodes, is returned.

By serializing subtrees and mapping these to unique identifiers, this solution efficiently identifies and collects roots of all duplicate subtrees in a binary tree. It leverages hashing (via unordered maps) to manage and compare subtree structures effectively.

java
class Solution {
    public List<TreeNode> getDuplicateSubtrees(TreeNode root) {
        List<TreeNode> duplicates = new LinkedList<>();
        searchDuplicates(root, new HashMap<>(), new HashMap<>(), duplicates);
        return duplicates;
    }

    public int searchDuplicates(TreeNode node, Map<String, Integer> subtreeMap,
            Map<Integer, Integer> countMap, List<TreeNode> duplicates) {
        if (node == null) {
            return 0;
        }
        String serial = searchDuplicates(node.left, subtreeMap, countMap, duplicates) + "," + node.val +
                "," + searchDuplicates(node.right, subtreeMap, countMap, duplicates);
        if (!subtreeMap.containsKey(serial)) {
            subtreeMap.put(serial, subtreeMap.size() + 1);
        }
        int subtreeId = subtreeMap.get(serial);
        countMap.put(subtreeId, countMap.getOrDefault(subtreeId, 0) + 1);
        if (countMap.get(subtreeId) == 2) {
            duplicates.add(node);
        }
        return subtreeId;
    }
}

In this Java solution, the task is to find all duplicate subtrees in a given binary tree. Each subtree is serialized into a string format and stored alongside its occurrence count using hash maps. This method not only identifies duplicates efficiently but also stores references to the original subtree roots where the duplicates start.

The implementation involves two primary methods:

  1. getDuplicateSubtrees(TreeNode root): This method initializes necessary data structures and triggers the recursive search through the binary tree to locate and collect duplicates.

  2. searchDuplicates(TreeNode node, Map subtreeMap, Map countMap, List<TreeNode> duplicates): This recursive function:

    • Serializes the current subtree rooted at node.
    • Tracks the serialization in subtreeMap to ensure each unique subtree gets a unique ID.
    • Records the frequency of each subtree ID in countMap.
    • Checks the count of each serialization; if a serialization appears exactly twice, it adds the root node of that subtree to the duplicates list. This ensures each duplicate subtree is only added once.
    • Finally, returns the unique subtree ID, facilitating serialization of parent trees.

The solution efficiently catalogs and checks for duplicates using serialization and mapping strategies, ensuring that duplicates are identified based on structural equality rather than object references. This approach provides a robust and scalable method to solve the problem of finding duplicate subtrees in binary trees.

python
class Solution:
    def findRepeatedSubtrees(self, root):
        def dfs(node):
            if not node:
                return "#"
            sub_tree = (dfs(node.left), node.val, dfs(node.right))
            if sub_tree not in tree_map:
                tree_map[sub_tree] = len(tree_map) + 1
            subtree_id = tree_map[sub_tree]
            counter[subtree_id] += 1
            if counter[subtree_id] == 2:
                result.append(node)
            return subtree_id

        tree_map = {}
        counter = collections.defaultdict(int)
        result = []
        dfs(root)
        return result

The provided Python script defines a method to find duplicate subtrees within a binary tree. Each subtree that appears more than once with the same structure and node values is considered duplicate. The solution employs a depth-first search (DFS) to examine each node's subtree and utilizes a tuple to represent the structure and values of these subtrees. Note the use of serialization techniques where the subtrees are converted into a serialized string format represented by tuples.

The following explains the critical components and steps of the algorithm:

  • Define a helper function dfs() which is a recursive function to traverse the tree in a depth-first manner.
  • For each node encountered, it identifies a unique signature (a tuple) that represents the subtree structure rooted at that node. This tuple includes the node's value and the identifiers of the left and right subtrees.
  • Use a dictionary tree_map to map each unique subtree structure to a unique identifier.
  • Employ another dictionary counter to count the occurrences of each subtree identifier.
  • If a subtree's count in counter reaches two — indicating the subtree has been seen once before — it adds the subtree's root node to the result list result.
  • Finally, return the result list, which contains the root nodes of all duplicate subtrees.

This approach ensures that each subtree is only recorded once when it first becomes a duplicate, optimizing the process and avoiding redundancy.

Comments

No comments yet.