Sentence Similarity

Updated on 07 July, 2025
Sentence Similarity header image

Problem Statement

In the problem, sentences are represented as arrays of words and we are provided two such arrays, sentence1 and sentence2. Along with these sentences, an array of pairs, similarPairs, is given which denotes word similarities between the elements of these arrays.

The task is to determine if sentence1 and sentence2 are similar based on two criteria:

  1. Both sentences must have the exact same number of words.
  2. Corresponding words in each sentence are considered similar either if they are exactly the same or if they are deemed similar according to the similarPairs list.

It's crucial to note that similarity is not transitive in this context. That means even if word A is similar to word B, and word B is similar to word C, word A does not necessarily have to be similar to word C.

Examples

Example 1

Input:

sentence1 = ["great","acting","skills"], sentence2 = ["fine","drama","talent"], similarPairs = [["great","fine"],["drama","acting"],["skills","talent"]]

Output:

true

Explanation:

The two sentences have the same length and each word i of sentence1 is also similar to the corresponding word in sentence2.

Example 2

Input:

sentence1 = ["great"], sentence2 = ["great"], similarPairs = []

Output:

true

Explanation:

A word is similar to itself.

Example 3

Input:

sentence1 = ["great"], sentence2 = ["doubleplus","good"], similarPairs = [["great","doubleplus"]]

Output:

false

Explanation:

As they don't have the same length, we return false.

Constraints

  • 1 <= sentence1.length, sentence2.length <= 1000
  • 1 <= sentence1[i].length, sentence2[i].length <= 20
  • sentence1[i] and sentence2[i] consist of English letters.
  • 0 <= similarPairs.length <= 1000
  • similarPairs[i].length == 2
  • 1 <= xi.length, yi.length <= 20
  • xi and yi consist of lower-case and upper-case English letters.
  • All the pairs (xi, yi) are distinct.

Approach and Intuition

Understanding the problem requires us to handle string array comparisons and use a mapping strategy for the similarity pairs for efficient lookup. Here’s a step-by-step breakdown of how we can approach this:

  1. Check Sentence Length: First, ensure that both sentence1 and sentence2 have the same number of words. If not, we immediately return false.

  2. Mapping Similarities: Convert the list of similarPairs to a dictionary (or hashmap) for O(1) average time complexity on lookups. Each entry in the dictionary takes the form of {word: set(similar_words)}, thereby grouping each word with its corresponding set of similar words.

  3. Comparing Words: Iterate over each word in both sentences simultaneously:

    • If the current words in both sentences are identical, they are naturally similar.
    • If they are different, look up each word in the similarity dictionary:
      • Check if the word from sentence1 exists in the set of similar words for the corresponding word from sentence2 and vice versa.
    • If any word pair doesn't satisfy the similarity condition, we return false.
  4. Return Result: If all word pairs from both sentences are found similar by the above conditions, the function should return true, concluding that both sentences are similar. If any condition fails at any point, the function should return false.

By structuring the approach with efficient data structures and ensuring early exits on mismatch, the solution is both time and space optimized within the given constraints. This method utilizes computational resources in an effective manner while adhering to the problem's requirements.

Solutions

  • C++
cpp
class Solution {
public:
    bool sentencesCompare(vector<string>& sent1, vector<string>& sent2,
                          vector<vector<string>>& similar) {
        if (sent1.size() != sent2.size()) {
            return false;
        }
        unordered_map<string, unordered_set<string>> similarMap;
        for (auto& pair : similar) {
            similarMap[pair[0]].insert(pair[1]);
            similarMap[pair[1]].insert(pair[0]);
        }
    
        for (int idx = 0; idx < sent1.size(); idx++) {
            // Check if the words are the same
            if (sent1[idx] == sent2[idx]) {
                continue;
            }
            // Check if there exists a similarity in words
            if (similarMap[sent1[idx]].find(sent2[idx]) != similarMap[sent1[idx]].end()) {
                continue;
            }
            return false;
        }
    
        return true;
    }
};

This C++ solution checks if two sentences represented by the vectors sent1 and sent2 are similar based on a given list of pairs of similar words. The core concept is to leverage a hash map to store each word of the similar pairs for quick look-up, ensuring an efficient comparison process. Here's the breakdown of how the solution accomplishes this:

  1. Begin by comparing the sizes of sent1 and sent2. If they differ, the sentences are immediately deemed dissimilar, and the function returns false.
  2. Create an unordered_map named similarMap where each key is a word from the list of similar pairs, and its value is an unordered_set containing the words it is similar to. This setup allows the checking of word similarity in constant time.
  3. Populate similarMap by iterating over the list similar. For each pair of similar words, add each word as a key pointing to the other as part of its set. This ensures mutual mapping, i.e., if "A" is similar to "B", then "B" is also similar to "A".
  4. Iterate through the sentences using an index. For each position, check if the words in sent1 and sent2 are either the same or present as similar in similarMap.
  5. If two corresponding words are neither identical nor listed as similar, return false, indicating that the sentences are not similar.
  6. If all checks throughout the iteration pass, return true, confirming the similarity of sent1 and sent2.

This approach effectively scales with input size due to the constant-time efficiency of hash map and set operations, thus providing a robust solution to sentence similarity checking.

  • Java
java
class Solution {
public
    boolean areWordsSimilar(String[] words1, String[] words2, List<List<String>> pairs) {
        if (words1.length != words2.length) {
            return false;
        }
        Map<String, Set<String>> mapPairs = new HashMap<>();
        for (List<String> pair : pairs) {
            mapPairs.computeIfAbsent(pair.get(0), k -> new HashSet<>()).add(pair.get(1));
            mapPairs.computeIfAbsent(pair.get(1), k -> new HashSet<>()).add(pair.get(0));
        }
    
        for (int index = 0; index < words1.length; index++) {
            if (words1[index].equals(words2[index])) {
                continue;
            }
            if (mapPairs.containsKey(words1[index]) && mapPairs.get(words1[index]).contains(words2[index])) {
                continue;
            }
            return false;
        }
    
        return true;
    }
}

The provided Java solution is designed to determine if two arrays of words, words1 and words2, are similar based on a list of given pairs that define which words are considered similar. Follow this step-by-step explanation of the code to understand its logic and operation:

  1. The method areWordsSimilar checks if the lengths of words1 and words2 are the same. If they differ, it immediately returns false, ensuring that further comparison only occurs between arrays of equal length.

  2. It defines a HashMap named mapPairs which will store each word and its corresponding set of similar words. This facilitates quick lookup to check word similarity.

  3. The method then populates this map using the pairs list. For each pair, it ensures both words reference each other as similar by accessing or creating a new HashSet for each word and adding the counterpart word.

  4. After setting up the mapping of similar words, the function iterates through both word arrays simultaneously by index. For each pair of words:

    • If the words at the current index in words1 and words2 are identical, it skips to the next pair.
    • If the words are different, it checks mapPairs to see if the first word considers the second word similar (i.e., if the second word is contained in the set of similar words of the first). If not, the method returns false.
  5. If all checks pass, the method concludes that the two arrays of words are similar based on the provided pairs and returns true.

This approach prioritizes efficiency in checking word similarity by using a combination of hash maps and sets. By setting up a bidirectional similarity map at the beginning, it ensures quick lookups and minimal iteration thereafter.

  • Python
python
class Solution(object):
    def checkSimilarity(self, s1, s2, pairs):
        if len(s1) != len(s2):
            return False
        synonyms = defaultdict(set)
        for first, second in pairs:
            synonyms[first].add(second)
            synonyms[second].add(first)
        for idx in range(len(s1)):
            if s1[idx] == s2[idx] or s2[idx] in synonyms[s1[idx]]:
                continue
            return False
        return True

This Python solution is designed to check the similarity between two sentences based on given pairs of synonyms. The function checkSimilarity takes three parameters: s1 and s2 for the sentences and pairs containing possible synonym relationships. Follow these guidelines to understand the implementation:

  1. Initially, it checks if both sentences s1 and s2 are of the same length. If not, it immediately returns False, as differing lengths preclude similarity.

  2. It sets up a defaultdict of sets named synonyms to store all provided synonym pairs in a two-way relationship, meaning if a is a synonym of b, then b is considered a synonym of a.

  3. The function then iterates through each word in the sentences by their index. For each pair of corresponding words from s1 and s2, it checks:

    • If the words are exactly the same.
    • If the word from s2 exists in the set of synonyms for the corresponding word from s1.

    If neither condition is true, the function returns False.

  4. If all pairs of words pass the similarity check, it returns True, confirming that the sentences are similar based on the given synonym pairs.

This approach efficiently utilizes set operations for quick look-up and ensures only relevant synonym checks are made by using a dictionary of sets.

Comments

No comments yet.