
Problem Statement
In the world of email communication, addresses may appear different due to formatting in the local part but still refer to the same recipient. Each email consists of a local name and a domain name, separated by the '@'
symbol. The local part can include periods ('.'
) and a plus symbol ('+'
), both of which have specific interpretation rules:
Periods (
'.'
) in the local name are ignored. For example,john.doe@example.com
is equivalent tojohndoe@example.com
.Any characters following a plus symbol (
'+'
) in the local name are ignored. For example,jane+filter@example.com
is treated asjane@example.com
.
These transformation rules allow multiple-looking addresses to route to the same inbox. The task is to compute how many unique email addresses actually receive emails after applying the transformation rules.
Examples
Example 1
Input:
emails = ["test.email+alex@example.com", "test.e.mail+bob.cathy@example.com", "testemail+david@sample.org"]
Output:
2
Explanation:
All three emails resolve to: 1. testemail@example.com 2. testemail@sample.org Thus, two unique addresses receive emails.
Example 2
Input:
emails = ["a@example.com", "b@example.com", "c@example.com"]
Output:
3
Explanation:
All three emails are already unique. No normalization affects them.
Constraints
1 <= emails.length <= 100
1 <= emails[i].length <= 100
Each
emails[i]
contains:- Lowercase English letters
- The characters
'+'
,'.'
, and'@'
Exactly one
'@'
character per emailLocal and domain parts are non-empty
Local names do not start with
'+'
Domain names end with
".com"
and have at least one character before the suffix
Approach and Intuition
The solution involves normalizing each email and tracking the number of unique addresses:
Initialize a Set Use a set to collect normalized email addresses. Sets ensure uniqueness automatically.
Normalize Each Email
Split the email at
'@'
intolocal
anddomain
parts.In the
local
part:- Remove all periods (
'.'
) - Discard any characters after the first
'+'
- Remove all periods (
Reconstruct the email as
<processed_local>@<domain>
Insert into Set
- Add each normalized email to the set.
Return the Count
- The number of unique email addresses is simply the size of the set.
This approach is efficient due to simple string operations and the constant-time average complexity of set insertion and lookup.
Solutions
- C++
class Solution {
public:
int getUniqueEmailCount(vector<string>& emails) {
unordered_set<string> uniqueEmailSet;
for (string email : emails) {
string processedEmail;
for (char c : email) {
if (c == '+' || c == '@') break;
if (c == '.') continue;
processedEmail += c;
}
string domain;
for (int idx = email.length() - 1; idx >= 0; --idx) {
char character = email[idx];
domain += character;
if (character == '@') break;
}
reverse(domain.begin(), domain.end());
processedEmail += domain;
uniqueEmailSet.insert(processedEmail);
}
return uniqueEmailSet.size();
}
};
This solution involves determining the count of unique email addresses from a list. Each email address is processed to standardize it before inserting it into a set, which inherently manages uniqueness. The key steps to process each email address are:
- Initialize an unordered set to store unique standardized email addresses.
- Loop through each email:
- Initialize an empty string to build the processed (standardized) part of the email address.
- Traverse each character of the email address:
- Break the loop if character is '+' or '@' as these denote the end of the local part of the email.
- Ignore the character if it is '.' since dots are disregarded in the local part.
- Otherwise, append the character to the processed email string.
- After processing the local part, extract the domain by iterating from the end of the email until '@' is encountered. This substring is then reversed to maintain the correct order and concatenated to the processed local part.
- Insert the resulting processed email into the set.
- After processing all emails, the size of the set gives the number of unique email addresses.
This approach emphasizes efficient processing using fundamental operations on strings and leveraging data structures like unordered_set to handle uniqueness. Every email undergoes a cleanup based on specified rules and then is stored uniquely, making the solution robust and scalable.
- Java
class Solution {
public int countUniqueEmails(String[] emails) {
Set<String> emailsSet = new HashSet<>();
for (String email : emails) {
StringBuilder refinedEmail = new StringBuilder();
for (int i = 0; i < email.length(); ++i) {
char ch = email.charAt(i);
if (ch == '+' || ch == '@') break;
if (ch != '.') refinedEmail.append(ch);
}
StringBuilder domain = new StringBuilder();
for (int i = email.length() - 1; i >= 0; --i) {
char ch = email.charAt(i);
domain.append(ch);
if (ch == '@') break;
}
domain.reverse();
refinedEmail.append(domain);
emailsSet.add(refinedEmail.toString());
}
return emailsSet.size();
}
}
The task is to compute the number of unique email addresses in a given array by considering email simplification rules, where:
- '.' in the local part should not be considered.
- Everything after '+' in the local part should be ignored.
In the provided Java solution:
- A
HashSet
is used to store unique email addresses. - Two
StringBuilder
objects handle the local and domain parts of each email. - The program iterates over each character of the email address:
- It stops adding characters to
refinedEmail
upon encountering '+' or '@'. - Continues to the domain part when '@' is encountered, reversing the collected domain to ensure correct order.
- It stops adding characters to
- The program halts further additions if a '+' is encountered and jumps directly to processing the domain.
- After processing, the constructed email is added to the
HashSet
to ensure uniqueness. - The size of the
HashSet
is returned, representing the count of unique email addresses.
The approach is efficient due to the HashSet
usage and the direct string manipulation, ensuring operation completion in linear time relative to the number of characters across all emails.
- JavaScript
let countDistinctEmails = function (emails) {
let distinctEmailSet = new Set();
emails.forEach(email => {
let localName = [];
let domain = [];
let i;
// Process local part until '+' or '@'
for (i = 0; i < email.length; i++) {
let char = email[i];
if (char === '+' || char === '@') break;
if (char !== '.') localName.push(char);
}
// Extract the domain part
for (i = email.lastIndexOf('@'); i < email.length; i++) {
domain.push(email[i]);
}
// Combine local and domain parts
let processedEmail = localName.join('') + domain.join('');
distinctEmailSet.add(processedEmail);
});
return distinctEmailSet.size;
};
The given JavaScript function countDistinctEmails
is designed to count unique email addresses from an array of emails. Each email is normalized by simplifying the local part of the email (the part before the '@' symbol) and combining it with the domain part (the part after '@').
Transforming each email involves:
- Creating a processed version of the local part by ignoring all periods ('.') and stopping at the first plus sign ('+') or at the '@' symbol.
- Extracting the domain directly from the '@' to the end of the string.
- The processed local part is then concatenated with the domain part to form the full email address.
- To ensure uniqueness, processed emails are added to a Set. Since a Set automatically excludes duplicates, this helps in maintaining only unique email addresses.
At the end of the function:
- The size of the Set, which carries all the unique emails, is returned. Hence, this represents the count of distinct email addresses.
This function effectively addresses cases where different string representations should be interpreted as the same email address due to characters that can be safely ignored or truncated according to standard email addressing rules. Thus, it provides an efficient way of counting unique email addresses by handling variations in email formatting robustly.
- Python
class Solution:
def numDistinctEmails(self, emailList: List[str]) -> int:
distinctEmails = set()
for email in emailList:
localName = []
for char in email:
if char == '+' or char == '@':
break
if char != '.':
localName.append(char)
domain = []
for char in reversed(email):
domain.append(char)
if char == '@':
break
domain = ''.join(domain[::-1])
localName = ''.join(localName)
distinctEmails.add(localName + domain)
return len(distinctEmails)
This Python solution for identifying the number of unique email addresses in a given list processes emails by isolating the local and domain parts. The algorithm works as follows:
- Initialize a set
distinctEmails
to ensure only unique emails are counted. - Loop over each email in the
emailList
provided:- Create an empty list
localName
to build the processed local part of the email. Iterate through characters in the email:- Stop adding characters to
localName
if a '+' or '@' is encountered since '+' signifies the start of characters to ignore, and '@' symbolizes that the domain part has started. - Ignore any '.' characters as they are not considered part of the unique address.
- Stop adding characters to
- Construct the domain portion:
- Start from the end of the email and accumulate characters until the '@' character is found. This reversed collection captures the domain.
- After processing both parts of the email, combine them and add to the
distinctEmails
set.
- Create an empty list
- Finally, return the count of unique emails by calculating the length of the
distinctEmails
set.
This method effectively parses and standardizes the format of each email to correctly identify unique addresses, factoring in common variations allowed in email protocols.
No comments yet.