
Problem Statement
When trying to understand if two sentences are similar, we represent each sentence as an array of words. For example, the sentence "I am happy with dogs"
can be represented as arr = ["I","am","happy","with","dogs"]
. Given such two arrays for sentences sentence1
and sentence2
, and an additional array similarPairs
containing pairs of words, our task is to determine if these two sentences are similar based on specific criteria.
Two sentences qualify as similar if:
- They are of equal length, i.e., contain the same number of words.
- Corresponding words in the position
i
ofsentence1
andsentence2
are deemed similar. Each word is considered similar to itself. - The similarity relationship is transitive: if word
a
is similar tob
, andb
is similar toc
, thena
is similar toc
.
The pairs in similarPairs
are given in the format [xi, yi]
, suggesting that word xi
is similar to word yi
.
Our function should return true
if the sentences are similar under the above conditions or false
otherwise.
Examples
Example 1
Input:
sentence1 = ["great","acting","skills"] sentence2 = ["fine","drama","talent"] similarPairs = [["great","good"],["fine","good"],["drama","acting"],["skills","talent"]]
Output:
true
Explanation:
The two sentences have the same length and each word in sentence1 is similar to the corresponding word in sentence2.
Example 2
Input:
sentence1 = ["I","enjoy","coding"] sentence2 = ["I","enjoy","anime"] similarPairs = [["manga","anime"],["platform","fun"],["coding","platform"],["fun","manga"]]
Output:
true
Explanation:
"coding" → "platform" → "fun" → "manga" → "anime". The first two words are identical, and the third is transitively similar.
Example 3
Input:
sentence1 = ["I","enjoy","coding"] sentence2 = ["I","enjoy","anime"] similarPairs = [["manga","comics"],["platform","fun"],["coding","platform"],["fun","manga"]]
Output:
false
Explanation:
There is no transitive path between "coding" and "anime".
Constraints
1 <= sentence1.length, sentence2.length <= 1000
1 <= sentence1[i].length, sentence2[i].length <= 20
sentence1[i]
andsentence2[i]
consist of lowercase and uppercase English letters.0 <= similarPairs.length <= 2000
similarPairs[i].length == 2
1 <= xi.length, yi.length <= 20
xi
andyi
consist of English letters.
Approach and Intuition
The solution involves building a Union-Find (Disjoint Set Union, DSU) data structure to track similarity groups efficiently. Here's the step-by-step breakdown:
Length Check: Immediately return
false
if the two sentences differ in length. They cannot be similar otherwise.Union-Find Setup:
- Initialize a parent mapping for all words found in
similarPairs
. - Define helper functions
find(word)
andunion(word1, word2)
to locate and merge sets.
- Initialize a parent mapping for all words found in
Construct Similarity Groups:
- Iterate over each pair in
similarPairs
, and merge their groups using theunion
operation.
- Iterate over each pair in
Compare Sentence Words:
- Iterate over each index
i
in the two sentences. - If
sentence1[i]
is equal tosentence2[i]
, continue. - If both words are in the union-find structure, check if they share the same root (i.e., are connected).
- If not, return
false
.
- Iterate over each index
Return Result:
- If all word pairs pass the similarity check, return
true
.
- If all word pairs pass the similarity check, return
This approach ensures performance is nearly linear with path compression and union-by-rank optimizations in Union-Find. It effectively handles the problem within given constraints.
Solutions
- C++
class DisjointSet {
private:
map<string, string> leader;
map<string, int> height;
public:
void insertString(string s) {
if (!leader.count(s)) {
leader[s] = s;
height[s] = 0;
}
}
bool exists(string s) {
return leader.count(s) > 0;
}
string findLeader(string s) {
if (leader[s] != s) {
leader[s] = findLeader(leader[s]);
}
return leader[s];
}
void merge(string s1, string s2) {
string leader1 = findLeader(s1), leader2 = findLeader(s2);
if (leader1 == leader2) {
return;
} else if (height[leader1] < height[leader2]) {
leader[leader1] = leader2;
} else if (height[leader1] > height[leader2]) {
leader[leader2] = leader1;
} else {
leader[leader2] = leader1;
height[leader1]++;
}
}
};
class SentenceSimilarityChecker {
public:
bool areSentencesSimilar(vector<string>& sentence1, vector<string>& sentence2,
vector<vector<string>>& similarPairs) {
if (sentence1.size() != sentence2.size()) {
return false;
}
DisjointSet dset;
for (auto& pair : similarPairs) {
dset.insertString(pair[0]);
dset.insertString(pair[1]);
dset.merge(pair[0], pair[1]);
}
for (size_t i = 0; i < sentence1.size(); i++) {
if (sentence1[i] == sentence2[i]) {
continue;
}
if (dset.exists(sentence1[i]) && dset.exists(sentence2[i]) &&
dset.findLeader(sentence1[i]) == dset.findLeader(sentence2[i])) {
continue;
}
return false;
}
return true;
}
};
The provided C++ solution addresses the problem of determining whether two sentences are similar using a set of predefined similar word pairs. This is achieved through the use of a Disjoint Set (Union-Find) data structure, which efficiently handles and queries the connectivity among elements.
DisjointSet Class:
- The
DisjointSet
class manages a collection of string elements where each string points to a leader (or parent) string, indicating its connected group. The class supports operations to insert a new string, check if a string exists, find the leader of a string, and merge two elements under a common leader. Here's a breakdown of its functions:insertString
: Ensures that each string is initialized as its own leader if it hasn't been already inserted.exists
: Checks if a string is already in the set.findLeader
: Implements path compression during the leader lookup, ensuring that all strings along the path directly point to their current leader, which optimizes future operations.merge
: Unites two strings by attaching one's leader to another's leader. This operation is optimized by height comparison to minimize tree depth, preventing performance degradation.
- The
SentenceSimilarityChecker Class:
- This class uses the
DisjointSet
to determine if two sentences are similar:- First, the similarity checker verifies if the sentences are of the same length; if not, it promptly returns false.
- For each pair of similar words given, it inserts them into the disjoint set and merges their sets.
- Finally, the function traverses through each word pair in the sentences and checks:
- If the words are identical, it moves to the next pair.
- If both words exist in the set and share the same leader, it proceeds; otherwise, it returns false.
- If all pairs pass the check, the sentences are deemed similar, returning true.
- This class uses the
The advantage of this approach is its efficiency in managing and comparing complex connections between words with potentially deep transitive similarities. Moreover, the use of path compression and height-based merging ensures that the union-find operations are performed in nearly constant time, making it scalable for large data sets.
- Java
class DisjointSet {
Map<String, String> leader = new HashMap<>();
Map<String, Integer> size = new HashMap<>();
public void insertElement(String element) {
if (!leader.containsKey(element)) {
leader.put(element, element);
size.put(element, 0);
}
}
public boolean containsElement(String element) {
return leader.containsKey(element);
}
public String getRoot(String element) {
if (leader.get(element) != element)
leader.put(element, getRoot(leader.get(element)));
return leader.get(element);
}
public void mergeSets(String element1, String element2) {
String root1 = getRoot(element1), root2 = getRoot(element2);
if (root1 == root2) {
return;
} else if (size.get(root1) < size.get(root2)) {
leader.put(root1, root2);
} else if (size.get(root1) > size.get(root2)) {
leader.put(root2, root1);
} else {
leader.put(root2, root1);
size.put(root1, size.get(root1) + 1);
}
}
}
class SentenceSimilarityVerifier {
public boolean areSentencesSimilar(String[] words1, String[] words2, List<List<String>> pairs) {
if (words1.length != words2.length) {
return false;
}
DisjointSet disjointSet = new DisjointSet();
for (List<String> pair : pairs) {
disjointSet.insertElement(pair.get(0));
disjointSet.insertElement(pair.get(1));
disjointSet.mergeSets(pair.get(0), pair.get(1));
}
for (int i = 0; i < words1.length; i++) {
if (words1[i].equals(words2[i])) {
continue;
}
if (disjointSet.containsElement(words1[i]) && disjointSet.containsElement(words2[i]) &&
disjointSet.getRoot(words1[i]).equals(disjointSet.getRoot(words2[i]))) {
continue;
}
return false;
}
return true;
}
}
The Java implementation provided addresses the problem of determining if two sentences are similar based on word-to-word similarities, including synonyms, using a Disjoint Set (Union-Find) data structure. Here's how it operates:
Class DisjointSet:
- Maintains a
leader
to identify the root leader of each set and asize
to manage the size of each set. - Includes methods to add elements (
insertElement
), check if an element is present (containsElement
), find the root leader of an element (getRoot
), and merge two sets (mergeSets
).
- Maintains a
Class SentenceSimilarityVerifier:
- Contains the
areSentencesSimilar
method that ultimately usesDisjointSet
. - Firstly, checks if both sentences have the same length, returning false if not.
- Initializes a
DisjointSet
instance and populates it using the given list of synonym pairs. - For each word pair, it checks if they are from the same set or not. If all word pairs from both sentences meet this criterion, it signifies sentence similarity.
- Contains the
Key points in the solution:
- Initializes each unique word as its own set in
DisjointSet
. - Words in the pairs list are united, signifying synonym relationships.
- When comparing two words in the sentences (
words1
andwords2
), checks if they are identical or if they belong to the same set, indicating they're synonyms. - Returns true if all word pairs across the two sentences either are the same or are synonyms (belong to the same set); otherwise, returns false.
This robust method ensures that the overall similarity check accounts for indirect relationships among synonyms and handles varying sentence lengths. This ensures precision in assessing sentence similarity, leveraging union-find's efficient set operations.
No comments yet.