Accounts Merge

Updated on 14 May, 2025
Accounts Merge header image

Problem Statement

The task is to merge multiple user accounts into unique ones based on common email addresses found among them. We are provided with a list named accounts, where each sublist represents an individual's account that includes the person's name followed by their email addresses. It is important to note that the existence of one or more shared email addresses across these sublists is the indicator that they belong to the same individual.

A critical detail in the merging process is that, even if two accounts share a name, they will be considered separate unless a common email address is found. This is accounted for since people may share names but have different accounts. Once accounts are identified for merging, the consolidated account should have emails sorted in lexicographical order, and each account should continue to start with the user's name. The final output is not required to maintain any specific order for the accounts.

Examples

Example 1

Input:

accounts = [["John","johnsmith@mail.com","john_newyork@mail.com"],["John","johnsmith@mail.com","john00@mail.com"],["Mary","mary@mail.com"],["John","johnnybravo@mail.com"]]

Output:

[["John","john00@mail.com","john_newyork@mail.com","johnsmith@mail.com"],["Mary","mary@mail.com"],["John","johnnybravo@mail.com"]]

Explanation:

The first and second John's are the same person as they have the common email "johnsmith@mail.com".
The third John and Mary are different people as none of their email addresses are used by other accounts.
We could return these lists in any order, for example the answer [['Mary', 'mary@mail.com'], ['John', 'johnnybravo@mail.com'],
['John', 'john00@mail.com', 'john_newyork@mail.com', 'johnsmith@mail.com']] would still be accepted.

Example 2

Input:

accounts = [["Gabe","Gabe0@m.co","Gabe3@m.co","Gabe1@m.co"],["Kevin","Kevin3@m.co","Kevin5@m.co","Kevin0@m.co"],["Ethan","Ethan5@m.co","Ethan4@m.co","Ethan0@m.co"],["Hanzo","Hanzo3@m.co","Hanzo1@m.co","Hanzo0@m.co"],["Fern","Fern5@m.co","Fern1@m.co","Fern0@m.co"]]

Output:

[["Ethan","Ethan0@m.co","Ethan4@m.co","Ethan5@m.co"],["Gabe","Gabe0@m.co","Gabe1@m.co","Gabe3@m.co"],["Hanzo","Hanzo0@m.co","Hanzo1@m.co","Hanzo3@m.co"],["Kevin","Kevin0@m.co","Kevin3@m.co","Kevin5@m.co"],["Fern","Fern0@m.co","Fern1@m.co","Fern5@m.co"]]

Constraints

  • 1 <= accounts.length <= 1000
  • 2 <= accounts[i].length <= 10
  • 1 <= accounts[i][j].length <= 30
  • accounts[i][0] consists of English letters.
  • accounts[i][j] (for j > 0) is a valid email.

Approach and Intuition

Given the problem requires the merging of accounts based on commonalities in email addresses among potentially fragmented account information, it is analogous to identifying connected components in a graph:

  1. Modeling the Accounts as a Graph:

    • Think of each unique email as a node.
    • Each user can be considered a connection (or an edge) that ties these email nodes together, based on which emails are listed under their name. If two accounts have overlapping emails, they share edges, forming connected components in a graph.
  2. Utilize Union-Find Data Structure:

    • This is apt for dynamically merging sets and finding connected components.
    • At each step, union operations can help merge accounts that share a common email (node).
  3. Mapping and Compaction:

    • From the graph perspective, harvest all nodes (emails) connected to each other and map them back to users.
    • For every component (unique user discovered through the connected component analysis), compile and sort the emails.
  4. Reconstruction of the Final Account List:

    • Post mapping and sorting, reconstruct the accounts in the required format starting with the user's name followed by their sorted list of emails.
    • The sequence of accounts in the output doesn’t matter, providing flexibility in how the list is constructed.

This approach effectively aggregates scattered account information into organized, consolidated account records using graph theory concepts combined with efficient data structures for set management.

Solutions

  • C++
  • Java
cpp
class DisjointSetUnion {
public:
    vector<int> leader;
    vector<int> rank;
    
    DisjointSetUnion(int size) : leader(size), rank(size) {
        for (int i = 0; i < size; ++i) {
            leader[i] = i;
            rank[i] = 1;
        }
    }
    
    int find(int x) {
        if (x == leader[x]) {
            return x;
        }
        return leader[x] = find(leader[x]);
    }
    
    void unionByRank(int x, int y) {
        int rootX = find(x);
        int rootY = find(y);
        
        if (rootX == rootY) return;
        
        if (rank[rootX] >= rank[rootY]) {
            rank[rootX] += rank[rootY];
            leader[rootY] = rootX;
        } else {
            rank[rootY] += rank[rootX];
            leader[rootX] = rootY;
        }
    }
};

class Solution {
public:
    vector<vector<string>> accountsMerge(vector<vector<string>>& accounts) {
        int n = accounts.size();
        DisjointSetUnion dsu(n);
        
        unordered_map<string, int> emailToIndex;
        
        for (int i = 0; i < n; i++) {
            int accountLen = accounts[i].size();

            for (int j = 1; j < accountLen; j++) {
                string email = accounts[i][j];
                
                if (emailToIndex.find(email) == emailToIndex.end()) {
                    emailToIndex[email] = i;
                } else {
                    dsu.unionByRank(i, emailToIndex[email]);
                }
            }
        }
        
        unordered_map<int, vector<string>> groupedEmails;
        for (const auto& pair : emailToIndex) {
            int root = dsu.find(pair.second);
            groupedEmails[root].push_back(pair.first);
        }

        vector<vector<string>> mergedAccounts;
        for (auto& emails : groupedEmails) {
            int root = emails.first;
            vector<string> mergedAccount = {accounts[root][0]};
            mergedAccount.insert(end(mergedAccount), begin(emails.second), end(emails.second));
            sort(next(begin(mergedAccount)), end(mergedAccount));
            mergedAccounts.push_back(mergedAccount);
        }
        
        return mergedAccounts;
    }
};

The given C++ solution tackles the problem of merging accounts based on shared email addresses using a Disjoint Set Union (DSU) data structure. Let's break down the implementation:

  1. Introduce a DisjointSetUnion class designed to manage the connectivity of account indices. This includes methods for performing find operations to identify the root of a set, and union operations using rank to maximize efficiency.

  2. Within the DSU class:

    • Initialize each account as its own leader with a respective rank in the constructor.
    • Implement path compression in the find method to optimize future operations.
    • Implement union by rank within the unionByRank method to keep the tree's height minimal.
  3. Introduce a main class, Solution, which processes merging tasks utilizing the DSU:

    • Map each email to an index corresponding to its first appearance using an unordered map.
    • For every email in each account, either assign the index if it's the first occurrence, or merge the current index with the stored index using unionByRank.
  4. Post union-find operations:

    • Organize emails into groups based on their root index using another unordered map.
    • For each group, retrieve the account name using the root index, append all associated emails, sort them, and compile them into the final result.
  5. Sort the emails in each merged account to maintain an orderly output.

This design leverages the DSU structure efficiently to handle merging operations, ensuring that linked accounts are accurately and systematically merged based on shared emails. The solution ensures scalability and efficiency suitable for handling a large number of accounts and email connections.

java
class UnionFind {
    int[] leader;
    int[] rank;

    UnionFind(int size) {
        leader = new int[size];
        rank = new int[size];

        for (int i = 0; i < size; ++i) {
            leader[i] = i;
            rank[i] = 1;
        }
    }

    public int find(int element) {
        if (element == leader[element]) {
            return element;
        }

        return leader[element] = find(leader[element]);
    }

    public void union(int element1, int element2) {
        int root1 = find(element1);
        int root2 = find(element2);

        if (root1 != root2) {
            if (rank[root1] >= rank[root2]) {
                rank[root1] += rank[root2];
                leader[root2] = root1;
            } else {
                rank[root2] += rank[root1];
                leader[root1] = root2;
            }
        }
    }
}

class Solution {
    public List<List<String>> accountsMerge(List<List<String>> accounts) {
        int numAccounts = accounts.size();
        UnionFind uf = new UnionFind(numAccounts);

        Map<String, Integer> emailOwner = new HashMap<>();

        for (int i = 0; i < numAccounts; i++) {
            int numEmails = accounts.get(i).size();

            for (int j = 1; j < numEmails; j++) {
                String email = accounts.get(i).get(j);
                String name = accounts.get(i).get(0);

                if (!emailOwner.containsKey(email)) {
                    emailOwner.put(email, i);
                } else {
                    uf.union(i, emailOwner.get(email));
                }
            }
        }

        Map<Integer, TreeSet<String>> components = new HashMap<>();
        for (Map.Entry<String, Integer> entry : emailOwner.entrySet()) {
            String email = entry.getKey();
            int index = entry.getValue();
            int root = uf.find(index);

            if (!components.containsKey(root)) {
                components.put(root, new TreeSet<>());
            }
            components.get(root).add(email);
        }

        List<List<String>> mergedAccounts = new ArrayList<>();
        for (Map.Entry<Integer, TreeSet<String>> entry : components.entrySet()) {
            List<String> emails = new ArrayList<>(entry.getValue());
            emails.add(0, accounts.get(entry.getKey()).get(0));
            mergedAccounts.add(emails);
        }

        return mergedAccounts;
    }
}

The Java implementation for merging accounts tackles the issue by identifying accounts that belong to the same user through shared email addresses. These steps summarize how this operation is strategically handled:

  1. Define a UnionFind class to efficiently manage and union account indices, determining account connectivity through emails. This class manages each element's leader and its rank for merging efficiency.

  2. Implement methods within UnionFind:

    • find: Determine the representative leader of the current element, ensuring that parent pointers are directed correctly using path compression.
    • union: Join two elements under one leader based on their ranks to maintain tree balance.
  3. Within the Solution class, use the accountsMerge method to process account information:

    • Initialize a UnionFind instance with a size representing the total number of accounts.
    • Create a mapping (emailOwner) to keep track of the first account associated with each unique email.
    • Iterate over each account and email, using the union method to link accounts whenever a shared email is found.
    • Aggregate emails into components using another map (components) where the key is the root leader's index from UnionFind, ensuring all associated emails are grouped together.
    • Sort and collect emails for each root leader to construct the final list of merged accounts, including the account name at the first index.

Throughout the process, the UnionFind structure supports efficiently querying and updating account relations, while integration with TreeSet and HashMap ensures that all emails are collected, sorted, and assigned to their respective root account efficiently. This provides a merged view of user accounts based on common email addresses.

Comments

No comments yet.