Unique Email Addresses

Updated on 08 July, 2025
Unique Email Addresses header image

Problem Statement

In the world of email communication, addresses may appear different due to formatting in the local part but still refer to the same recipient. Each email consists of a local name and a domain name, separated by the '@' symbol. The local part can include periods ('.') and a plus symbol ('+'), both of which have specific interpretation rules:

  • Periods ('.') in the local name are ignored. For example, john.doe@example.com is equivalent to johndoe@example.com.

  • Any characters following a plus symbol ('+') in the local name are ignored. For example, jane+filter@example.com is treated as jane@example.com.

These transformation rules allow multiple-looking addresses to route to the same inbox. The task is to compute how many unique email addresses actually receive emails after applying the transformation rules.


Examples

Example 1

Input:

emails = ["test.email+alex@example.com", "test.e.mail+bob.cathy@example.com", "testemail+david@sample.org"]

Output:

2

Explanation:

All three emails resolve to:
1. testemail@example.com
2. testemail@sample.org

Thus, two unique addresses receive emails.

Example 2

Input:

emails = ["a@example.com", "b@example.com", "c@example.com"]

Output:

3

Explanation:

All three emails are already unique. No normalization affects them.

Constraints

  • 1 <= emails.length <= 100

  • 1 <= emails[i].length <= 100

  • Each emails[i] contains:

    • Lowercase English letters
    • The characters '+', '.', and '@'
  • Exactly one '@' character per email

  • Local and domain parts are non-empty

  • Local names do not start with '+'

  • Domain names end with ".com" and have at least one character before the suffix


Approach and Intuition

The solution involves normalizing each email and tracking the number of unique addresses:

  1. Initialize a Set Use a set to collect normalized email addresses. Sets ensure uniqueness automatically.

  2. Normalize Each Email

    • Split the email at '@' into local and domain parts.

    • In the local part:

      • Remove all periods ('.')
      • Discard any characters after the first '+'
    • Reconstruct the email as <processed_local>@<domain>

  3. Insert into Set

    • Add each normalized email to the set.
  4. Return the Count

    • The number of unique email addresses is simply the size of the set.

This approach is efficient due to simple string operations and the constant-time average complexity of set insertion and lookup.

Solutions

  • C++
cpp
class Solution {
public:
    int getUniqueEmailCount(vector<string>& emails) {
        unordered_set<string> uniqueEmailSet;
    
        for (string email : emails) {
            string processedEmail;
    
            for (char c : email) {
                if (c == '+' || c == '@') break;
                if (c == '.') continue;
                processedEmail += c;
            }
    
            string domain;
            for (int idx = email.length() - 1; idx >= 0; --idx) {
                char character = email[idx];
                domain += character;
                if (character == '@') break;
            }
    
            reverse(domain.begin(), domain.end());
            processedEmail += domain;
    
            uniqueEmailSet.insert(processedEmail);
        }
    
        return uniqueEmailSet.size();
    }
};

This solution involves determining the count of unique email addresses from a list. Each email address is processed to standardize it before inserting it into a set, which inherently manages uniqueness. The key steps to process each email address are:

  1. Initialize an unordered set to store unique standardized email addresses.
  2. Loop through each email:
    • Initialize an empty string to build the processed (standardized) part of the email address.
    • Traverse each character of the email address:
      • Break the loop if character is '+' or '@' as these denote the end of the local part of the email.
      • Ignore the character if it is '.' since dots are disregarded in the local part.
      • Otherwise, append the character to the processed email string.
    • After processing the local part, extract the domain by iterating from the end of the email until '@' is encountered. This substring is then reversed to maintain the correct order and concatenated to the processed local part.
    • Insert the resulting processed email into the set.
  3. After processing all emails, the size of the set gives the number of unique email addresses.

This approach emphasizes efficient processing using fundamental operations on strings and leveraging data structures like unordered_set to handle uniqueness. Every email undergoes a cleanup based on specified rules and then is stored uniquely, making the solution robust and scalable.

  • Java
java
class Solution {
    public int countUniqueEmails(String[] emails) {
        Set<String> emailsSet = new HashSet<>();
            
        for (String email : emails) {
            StringBuilder refinedEmail = new StringBuilder();
                
            for (int i = 0; i < email.length(); ++i) {
                char ch = email.charAt(i);
    
                if (ch == '+' || ch == '@') break;
    
                if (ch != '.') refinedEmail.append(ch);
            }
    
            StringBuilder domain = new StringBuilder();
               
            for (int i = email.length() - 1; i >= 0; --i) {
                char ch = email.charAt(i);
                domain.append(ch);
                if (ch == '@') break;
            }
                
            domain.reverse();
            refinedEmail.append(domain);
            emailsSet.add(refinedEmail.toString());
        }
            
        return emailsSet.size();
    }
}

The task is to compute the number of unique email addresses in a given array by considering email simplification rules, where:

  • '.' in the local part should not be considered.
  • Everything after '+' in the local part should be ignored.

In the provided Java solution:

  • A HashSet is used to store unique email addresses.
  • Two StringBuilder objects handle the local and domain parts of each email.
  • The program iterates over each character of the email address:
    • It stops adding characters to refinedEmail upon encountering '+' or '@'.
    • Continues to the domain part when '@' is encountered, reversing the collected domain to ensure correct order.
  • The program halts further additions if a '+' is encountered and jumps directly to processing the domain.
  • After processing, the constructed email is added to the HashSet to ensure uniqueness.
  • The size of the HashSet is returned, representing the count of unique email addresses.

The approach is efficient due to the HashSet usage and the direct string manipulation, ensuring operation completion in linear time relative to the number of characters across all emails.

  • JavaScript
js
let countDistinctEmails = function (emails) {
    let distinctEmailSet = new Set();
    
    emails.forEach(email => {
        let localName = [];
        let domain = [];
        let i;
    
        // Process local part until '+' or '@'
        for (i = 0; i < email.length; i++) {
            let char = email[i];
            if (char === '+' || char === '@') break;
            if (char !== '.') localName.push(char);
        }
    
        // Extract the domain part
        for (i = email.lastIndexOf('@'); i < email.length; i++) {
            domain.push(email[i]);
        }
            
        // Combine local and domain parts
        let processedEmail = localName.join('') + domain.join('');
        distinctEmailSet.add(processedEmail);
    });
    
    return distinctEmailSet.size;
};

The given JavaScript function countDistinctEmails is designed to count unique email addresses from an array of emails. Each email is normalized by simplifying the local part of the email (the part before the '@' symbol) and combining it with the domain part (the part after '@').

Transforming each email involves:

  • Creating a processed version of the local part by ignoring all periods ('.') and stopping at the first plus sign ('+') or at the '@' symbol.
  • Extracting the domain directly from the '@' to the end of the string.
  • The processed local part is then concatenated with the domain part to form the full email address.
  • To ensure uniqueness, processed emails are added to a Set. Since a Set automatically excludes duplicates, this helps in maintaining only unique email addresses.

At the end of the function:

  • The size of the Set, which carries all the unique emails, is returned. Hence, this represents the count of distinct email addresses.

This function effectively addresses cases where different string representations should be interpreted as the same email address due to characters that can be safely ignored or truncated according to standard email addressing rules. Thus, it provides an efficient way of counting unique email addresses by handling variations in email formatting robustly.

  • Python
python
class Solution:
    def numDistinctEmails(self, emailList: List[str]) -> int:
        distinctEmails = set()
    
        for email in emailList:
            localName = []
            for char in email:
                if char == '+' or char == '@':
                    break
                if char != '.':
                    localName.append(char)
    
            domain = []
            for char in reversed(email):
                domain.append(char)
                if char == '@':
                    break
    
            domain = ''.join(domain[::-1])
            localName = ''.join(localName)
            distinctEmails.add(localName + domain)
    
        return len(distinctEmails)

This Python solution for identifying the number of unique email addresses in a given list processes emails by isolating the local and domain parts. The algorithm works as follows:

  1. Initialize a set distinctEmails to ensure only unique emails are counted.
  2. Loop over each email in the emailList provided:
    • Create an empty list localName to build the processed local part of the email. Iterate through characters in the email:
      • Stop adding characters to localName if a '+' or '@' is encountered since '+' signifies the start of characters to ignore, and '@' symbolizes that the domain part has started.
      • Ignore any '.' characters as they are not considered part of the unique address.
    • Construct the domain portion:
      • Start from the end of the email and accumulate characters until the '@' character is found. This reversed collection captures the domain.
    • After processing both parts of the email, combine them and add to the distinctEmails set.
  3. Finally, return the count of unique emails by calculating the length of the distinctEmails set.

This method effectively parses and standardizes the format of each email to correctly identify unique addresses, factoring in common variations allowed in email protocols.

Comments

No comments yet.