Most Common Word

Updated on 19 June, 2025
Most Common Word header image

Problem Statement

Given a string named paragraph consisting of words separated by spaces and punctuations, and an array banned containing certain words to be excluded, the task involves determining the most frequently occurring word in the paragraph that is not listed in the banned array. The evaluation of words is case-insensitive, where the returned word should be in lowercase regardless of its appearance in the input. The uniqueness of the result is assured by the promise that there is always a single, most frequent non-banned word in the paragraph.

Examples

Example 1

Input:

paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"]

Output:

"ball"

Explanation:

"hit" occurs 3 times, but it is a banned word.
"ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph.
Note that words in the paragraph are not case sensitive,
that punctuation is ignored (even if adjacent to words, such as "ball,"),
and that "hit" isn't the answer even though it occurs more because it is banned.

Example 2

Input:

paragraph = "a.", banned = []

Output:

"a"

Constraints

  • 1 <= paragraph.length <= 1000
  • paragraph consists of English letters, space ' ', or one of the symbols: "!?',;.".
  • 0 <= banned.length <= 100
  • 1 <= banned[i].length <= 10
  • banned[i] consists of only lowercase English letters.

Approach and Intuition

To solve the problem of identifying the most frequent non-banned word, the following steps outline a logical approach:

  1. Normalize the Input:

    • Convert the entire paragraph to lowercase to handle the case-insensitivity.
    • Cleanse the paragraph by removing or splitting on punctuations to isolate the pure words.
  2. Word Frequency Counting:

    • Utilize a dictionary to count the occurrences of each word that is not part of the banned list.
  3. Handling Banned Words:

    • Store the banned words in a set for O(1) lookup times when filtering the words during the counting process.
  4. Identify the Most Frequent Word:

    • Iterate through the frequency dictionary to find the word with the highest count that is not included in the banned set.

Example 1 Explanation

In the first example, the paragraph input contains multiple occurrences of the words "hit" and "ball" with various capitlizations and adjacent punctuations. After normalization and counting, despite "hit" being the most frequent, it gets excluded due to it being a banned word. The word "ball", being the next frequent and not banned, turns out to be the answer.

Example 2 Explanation

The second example is straightforward with the paragraph consisting of a single letter followed by a punctuation. Due to the absence of any banned words, the direct result after cleanup and counting would be the word "a".

Solutions

  • Java
  • Python
java
class Solution {
    public String findMostCommonWord(String paragraph, String[] exclusions) {

        Set<String> bannedSet = new HashSet<>();
        for (String excluded : exclusions)
            bannedSet.add(excluded);

        String result = "";
        int highestFrequency = 0;
        Map<String, Integer> counts = new HashMap<>();
        StringBuilder currentWord = new StringBuilder();
        char[] paragraphChars = paragraph.toCharArray();

        for (int i = 0; i < paragraphChars.length; ++i) {
            char currentChar = paragraphChars[i];

            if (Character.isLetter(currentChar)) {
                currentWord.append(Character.toLowerCase(currentChar));
                if (i != paragraphChars.length - 1)
                    continue;
            }

            if (currentWord.length() > 0) {
                String word = currentWord.toString();
                if (!bannedSet.contains(word)) {
                    int count = counts.getOrDefault(word, 0) + 1;
                    counts.put(word, count);
                    if (count > highestFrequency) {
                        result = word;
                        highestFrequency = count;
                    }
                }
                currentWord = new StringBuilder();
            }
        }
        return result;
    }
}

The following solution demonstrates how to find the most common word that is not included in a list of exclusions from a given paragraph in Java. The process involves the following steps:

  1. Create a HashSet named bannedSet to store all the words from the exclusions list for quick access and comparison.

  2. Initialize a HashMap called counts to keep track of the occurrence of each word in the paragraph.

  3. Use a StringBuilder called currentWord to efficiently build words as characters are processed.

  4. Convert the entire paragraph into a char array to iterate over every character.

  5. Loop through each character, adding alphabetic characters to currentWord and converting them to lowercase to ensure case insensitivity.

  6. Check for word boundaries and the end of the paragraph to process complete words:

    • If currentWord is not empty and the word is not in bannedSet, update its count in the counts map.
    • If the count of this word exceeds highestFrequency, update highestFrequency and set this word as result.
  7. Finally, after the loop, return result, which now contains the most frequent word that is not excluded.

This solution efficiently handles the counting and tracking of word frequencies while ignoring specified exclusions. The use of a HashSet for the exclusions and a HashMap for frequency counts ensures that the operations are done in optimal time.

python
class Solution:
    def frequentWord(self, text: str, excluded: List[str]) -> str:
        excluded_set = set(excluded)
        result_word = ""
        highest_freq = 0
        frequency = defaultdict(int)
        temp_word = []

        for index, character in enumerate(text):
            # Collect characters to form a valid word
            if character.isalnum():
                temp_word.append(character.lower())
                if index != len(text)-1:
                    continue

            # Process the word formed or at the end of the text
            if len(temp_word) > 0:
                formed_word = "".join(temp_word)
                if formed_word not in excluded_set:
                    frequency[formed_word] += 1
                    if frequency[formed_word] > highest_freq:
                        highest_freq = frequency[formed_word]
                        result_word = formed_word
                # Clear the collection for the next word
                temp_word = []

        return result_word

This Python code defines a method frequentWord in a class Solution to determine the most common word in a given text that is not part of a specified list of excluded words. Here's a breakdown of how the method works:

  • Begins by converting the list of words to exclude into a set for faster lookups (excluded_set).
  • Initializes result_word to store the most frequent non-excluded word and highest_freq to track its frequency.
  • Utilizes a defaultdict to keep a frequency count of all words (frequency).
  • temp_word is used to construct individual words from the input text.

The method processes each character in the input text:

  • If the character is alphanumeric, it appends it to temp_word and lowercases it for uniformity; continues to the next character unless it's the end of the text.
  • Once it encounters a space or a punctuation (or reaches the end of the text), it checks if temp_word forms a valid word, not in excluded_set.
  • If the word is valid and its frequency exceeds the highest_freq, it updates highest_freq and sets result_word to the current word.
  • Resets temp_word for collecting the next word.

Finally, the method returns the most frequent valid word. This solution seamlessly filters out unwanted words, maintains a count of the valid ones, and ensures that the result considers only the most frequently appearing word that isn't excluded.

Comments

No comments yet.