Sentence Similarity II

Updated on 07 July, 2025
Sentence Similarity II header image

Problem Statement

When trying to understand if two sentences are similar, we represent each sentence as an array of words. For example, the sentence "I am happy with dogs" can be represented as arr = ["I","am","happy","with","dogs"]. Given such two arrays for sentences sentence1 and sentence2, and an additional array similarPairs containing pairs of words, our task is to determine if these two sentences are similar based on specific criteria.

Two sentences qualify as similar if:

  • They are of equal length, i.e., contain the same number of words.
  • Corresponding words in the position i of sentence1 and sentence2 are deemed similar. Each word is considered similar to itself.
  • The similarity relationship is transitive: if word a is similar to b, and b is similar to c, then a is similar to c.

The pairs in similarPairs are given in the format [xi, yi], suggesting that word xi is similar to word yi.

Our function should return true if the sentences are similar under the above conditions or false otherwise.

Examples

Example 1

Input:

sentence1 = ["great","acting","skills"]
sentence2 = ["fine","drama","talent"]
similarPairs = [["great","good"],["fine","good"],["drama","acting"],["skills","talent"]]

Output:

true

Explanation:

The two sentences have the same length and each word in sentence1 is similar to the corresponding word in sentence2.

Example 2

Input:

sentence1 = ["I","enjoy","coding"]
sentence2 = ["I","enjoy","anime"]
similarPairs = [["manga","anime"],["platform","fun"],["coding","platform"],["fun","manga"]]

Output:

true

Explanation:

"coding" → "platform" → "fun" → "manga" → "anime".
The first two words are identical, and the third is transitively similar.

Example 3

Input:

sentence1 = ["I","enjoy","coding"]
sentence2 = ["I","enjoy","anime"]
similarPairs = [["manga","comics"],["platform","fun"],["coding","platform"],["fun","manga"]]

Output:

false

Explanation:

There is no transitive path between "coding" and "anime".

Constraints

  • 1 <= sentence1.length, sentence2.length <= 1000
  • 1 <= sentence1[i].length, sentence2[i].length <= 20
  • sentence1[i] and sentence2[i] consist of lowercase and uppercase English letters.
  • 0 <= similarPairs.length <= 2000
  • similarPairs[i].length == 2
  • 1 <= xi.length, yi.length <= 20
  • xi and yi consist of English letters.

Approach and Intuition

The solution involves building a Union-Find (Disjoint Set Union, DSU) data structure to track similarity groups efficiently. Here's the step-by-step breakdown:

  1. Length Check: Immediately return false if the two sentences differ in length. They cannot be similar otherwise.

  2. Union-Find Setup:

    • Initialize a parent mapping for all words found in similarPairs.
    • Define helper functions find(word) and union(word1, word2) to locate and merge sets.
  3. Construct Similarity Groups:

    • Iterate over each pair in similarPairs, and merge their groups using the union operation.
  4. Compare Sentence Words:

    • Iterate over each index i in the two sentences.
    • If sentence1[i] is equal to sentence2[i], continue.
    • If both words are in the union-find structure, check if they share the same root (i.e., are connected).
    • If not, return false.
  5. Return Result:

    • If all word pairs pass the similarity check, return true.

This approach ensures performance is nearly linear with path compression and union-by-rank optimizations in Union-Find. It effectively handles the problem within given constraints.

Solutions

  • C++
cpp
class DisjointSet {
private:
    map<string, string> leader;
    map<string, int> height;
    
public:
    void insertString(string s) {
        if (!leader.count(s)) {
            leader[s] = s;
            height[s] = 0;
        }
    }
    
    bool exists(string s) {
        return leader.count(s) > 0;
    }
    
    string findLeader(string s) {
        if (leader[s] != s) {
            leader[s] = findLeader(leader[s]);
        }
        return leader[s];
    }
    
    void merge(string s1, string s2) {
        string leader1 = findLeader(s1), leader2 = findLeader(s2);
        if (leader1 == leader2) {
            return;
        } else if (height[leader1] < height[leader2]) {
            leader[leader1] = leader2;
        } else if (height[leader1] > height[leader2]) {
            leader[leader2] = leader1;
        } else {
            leader[leader2] = leader1;
            height[leader1]++;
        }
    }
};
    
class SentenceSimilarityChecker {
public:
    bool areSentencesSimilar(vector<string>& sentence1, vector<string>& sentence2,
                             vector<vector<string>>& similarPairs) {
        if (sentence1.size() != sentence2.size()) {
            return false;
        }
            
        DisjointSet dset;
        for (auto& pair : similarPairs) {
            dset.insertString(pair[0]);
            dset.insertString(pair[1]);
            dset.merge(pair[0], pair[1]);
        }
    
        for (size_t i = 0; i < sentence1.size(); i++) {
            if (sentence1[i] == sentence2[i]) {
                continue;
            }
            if (dset.exists(sentence1[i]) && dset.exists(sentence2[i]) &&
                dset.findLeader(sentence1[i]) == dset.findLeader(sentence2[i])) {
                continue;
            }
            return false;
        }
        return true;
    }
};

The provided C++ solution addresses the problem of determining whether two sentences are similar using a set of predefined similar word pairs. This is achieved through the use of a Disjoint Set (Union-Find) data structure, which efficiently handles and queries the connectivity among elements.

  • DisjointSet Class:

    • The DisjointSet class manages a collection of string elements where each string points to a leader (or parent) string, indicating its connected group. The class supports operations to insert a new string, check if a string exists, find the leader of a string, and merge two elements under a common leader. Here's a breakdown of its functions:
      • insertString: Ensures that each string is initialized as its own leader if it hasn't been already inserted.
      • exists: Checks if a string is already in the set.
      • findLeader: Implements path compression during the leader lookup, ensuring that all strings along the path directly point to their current leader, which optimizes future operations.
      • merge: Unites two strings by attaching one's leader to another's leader. This operation is optimized by height comparison to minimize tree depth, preventing performance degradation.
  • SentenceSimilarityChecker Class:

    • This class uses the DisjointSet to determine if two sentences are similar:
      • First, the similarity checker verifies if the sentences are of the same length; if not, it promptly returns false.
      • For each pair of similar words given, it inserts them into the disjoint set and merges their sets.
      • Finally, the function traverses through each word pair in the sentences and checks:
        • If the words are identical, it moves to the next pair.
        • If both words exist in the set and share the same leader, it proceeds; otherwise, it returns false.
      • If all pairs pass the check, the sentences are deemed similar, returning true.

The advantage of this approach is its efficiency in managing and comparing complex connections between words with potentially deep transitive similarities. Moreover, the use of path compression and height-based merging ensures that the union-find operations are performed in nearly constant time, making it scalable for large data sets.

  • Java
java
class DisjointSet {
    Map<String, String> leader = new HashMap<>();
    Map<String, Integer> size = new HashMap<>();
    
    public void insertElement(String element) {
        if (!leader.containsKey(element)) {
            leader.put(element, element);
            size.put(element, 0);
        }
    }
    
    public boolean containsElement(String element) {
        return leader.containsKey(element);
    }
    
    public String getRoot(String element) {
        if (leader.get(element) != element)
            leader.put(element, getRoot(leader.get(element)));
        return leader.get(element);
    }
    
    public void mergeSets(String element1, String element2) {
        String root1 = getRoot(element1), root2 = getRoot(element2);
        if (root1 == root2) {
            return;
        } else if (size.get(root1) < size.get(root2)) {
            leader.put(root1, root2);
        } else if (size.get(root1) > size.get(root2)) {
            leader.put(root2, root1);
        } else {
            leader.put(root2, root1);
            size.put(root1, size.get(root1) + 1);
        }
    }
}
    
class SentenceSimilarityVerifier {
    public boolean areSentencesSimilar(String[] words1, String[] words2, List<List<String>> pairs) {
        if (words1.length != words2.length) {
            return false;
        }
    
        DisjointSet disjointSet = new DisjointSet();
        for (List<String> pair : pairs) {
            disjointSet.insertElement(pair.get(0));
            disjointSet.insertElement(pair.get(1));
            disjointSet.mergeSets(pair.get(0), pair.get(1));
        }
    
        for (int i = 0; i < words1.length; i++) {
            if (words1[i].equals(words2[i])) {
                continue;
            }
            if (disjointSet.containsElement(words1[i]) && disjointSet.containsElement(words2[i]) &&
                    disjointSet.getRoot(words1[i]).equals(disjointSet.getRoot(words2[i]))) {
                continue;
            }
            return false;
        }
        return true;
    }
}

The Java implementation provided addresses the problem of determining if two sentences are similar based on word-to-word similarities, including synonyms, using a Disjoint Set (Union-Find) data structure. Here's how it operates:

  • Class DisjointSet:

    • Maintains a leader to identify the root leader of each set and a size to manage the size of each set.
    • Includes methods to add elements (insertElement), check if an element is present (containsElement), find the root leader of an element (getRoot), and merge two sets (mergeSets).
  • Class SentenceSimilarityVerifier:

    • Contains the areSentencesSimilar method that ultimately uses DisjointSet.
    • Firstly, checks if both sentences have the same length, returning false if not.
    • Initializes a DisjointSet instance and populates it using the given list of synonym pairs.
    • For each word pair, it checks if they are from the same set or not. If all word pairs from both sentences meet this criterion, it signifies sentence similarity.

Key points in the solution:

  • Initializes each unique word as its own set in DisjointSet.
  • Words in the pairs list are united, signifying synonym relationships.
  • When comparing two words in the sentences (words1 and words2), checks if they are identical or if they belong to the same set, indicating they're synonyms.
  • Returns true if all word pairs across the two sentences either are the same or are synonyms (belong to the same set); otherwise, returns false.

This robust method ensures that the overall similarity check accounts for indirect relationships among synonyms and handles varying sentence lengths. This ensures precision in assessing sentence similarity, leveraging union-find's efficient set operations.

Comments

No comments yet.