
Problem Statement
Given a string named paragraph
consisting of words separated by spaces and punctuations, and an array banned
containing certain words to be excluded, the task involves determining the most frequently occurring word in the paragraph that is not listed in the banned
array. The evaluation of words is case-insensitive, where the returned word should be in lowercase regardless of its appearance in the input. The uniqueness of the result is assured by the promise that there is always a single, most frequent non-banned word in the paragraph.
Examples
Example 1
Input:
paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"]
Output:
"ball"
Explanation:
"hit" occurs 3 times, but it is a banned word. "ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. Note that words in the paragraph are not case sensitive, that punctuation is ignored (even if adjacent to words, such as "ball,"), and that "hit" isn't the answer even though it occurs more because it is banned.
Example 2
Input:
paragraph = "a.", banned = []
Output:
"a"
Constraints
1 <= paragraph.length <= 1000
- paragraph consists of English letters, space
' '
, or one of the symbols:"!?',;."
. 0 <= banned.length <= 100
1 <= banned[i].length <= 10
banned[i]
consists of only lowercase English letters.
Approach and Intuition
To solve the problem of identifying the most frequent non-banned word, the following steps outline a logical approach:
Normalize the Input:
- Convert the entire
paragraph
to lowercase to handle the case-insensitivity. - Cleanse the paragraph by removing or splitting on punctuations to isolate the pure words.
- Convert the entire
Word Frequency Counting:
- Utilize a dictionary to count the occurrences of each word that is not part of the
banned
list.
- Utilize a dictionary to count the occurrences of each word that is not part of the
Handling Banned Words:
- Store the banned words in a set for O(1) lookup times when filtering the words during the counting process.
Identify the Most Frequent Word:
- Iterate through the frequency dictionary to find the word with the highest count that is not included in the banned set.
Example 1 Explanation
In the first example, the paragraph input contains multiple occurrences of the words "hit" and "ball" with various capitlizations and adjacent punctuations. After normalization and counting, despite "hit" being the most frequent, it gets excluded due to it being a banned word. The word "ball", being the next frequent and not banned, turns out to be the answer.
Example 2 Explanation
The second example is straightforward with the paragraph consisting of a single letter followed by a punctuation. Due to the absence of any banned words, the direct result after cleanup and counting would be the word "a".
Solutions
- Java
- Python
class Solution {
public String findMostCommonWord(String paragraph, String[] exclusions) {
Set<String> bannedSet = new HashSet<>();
for (String excluded : exclusions)
bannedSet.add(excluded);
String result = "";
int highestFrequency = 0;
Map<String, Integer> counts = new HashMap<>();
StringBuilder currentWord = new StringBuilder();
char[] paragraphChars = paragraph.toCharArray();
for (int i = 0; i < paragraphChars.length; ++i) {
char currentChar = paragraphChars[i];
if (Character.isLetter(currentChar)) {
currentWord.append(Character.toLowerCase(currentChar));
if (i != paragraphChars.length - 1)
continue;
}
if (currentWord.length() > 0) {
String word = currentWord.toString();
if (!bannedSet.contains(word)) {
int count = counts.getOrDefault(word, 0) + 1;
counts.put(word, count);
if (count > highestFrequency) {
result = word;
highestFrequency = count;
}
}
currentWord = new StringBuilder();
}
}
return result;
}
}
The following solution demonstrates how to find the most common word that is not included in a list of exclusions from a given paragraph in Java. The process involves the following steps:
Create a
HashSet
namedbannedSet
to store all the words from the exclusions list for quick access and comparison.Initialize a
HashMap
calledcounts
to keep track of the occurrence of each word in the paragraph.Use a
StringBuilder
calledcurrentWord
to efficiently build words as characters are processed.Convert the entire paragraph into a char array to iterate over every character.
Loop through each character, adding alphabetic characters to
currentWord
and converting them to lowercase to ensure case insensitivity.Check for word boundaries and the end of the paragraph to process complete words:
- If
currentWord
is not empty and the word is not inbannedSet
, update its count in thecounts
map. - If the count of this word exceeds
highestFrequency
, updatehighestFrequency
and set this word asresult
.
- If
Finally, after the loop, return
result
, which now contains the most frequent word that is not excluded.
This solution efficiently handles the counting and tracking of word frequencies while ignoring specified exclusions. The use of a HashSet
for the exclusions and a HashMap
for frequency counts ensures that the operations are done in optimal time.
class Solution:
def frequentWord(self, text: str, excluded: List[str]) -> str:
excluded_set = set(excluded)
result_word = ""
highest_freq = 0
frequency = defaultdict(int)
temp_word = []
for index, character in enumerate(text):
# Collect characters to form a valid word
if character.isalnum():
temp_word.append(character.lower())
if index != len(text)-1:
continue
# Process the word formed or at the end of the text
if len(temp_word) > 0:
formed_word = "".join(temp_word)
if formed_word not in excluded_set:
frequency[formed_word] += 1
if frequency[formed_word] > highest_freq:
highest_freq = frequency[formed_word]
result_word = formed_word
# Clear the collection for the next word
temp_word = []
return result_word
This Python code defines a method frequentWord
in a class Solution
to determine the most common word in a given text that is not part of a specified list of excluded words. Here's a breakdown of how the method works:
- Begins by converting the list of words to exclude into a set for faster lookups (
excluded_set
). - Initializes
result_word
to store the most frequent non-excluded word andhighest_freq
to track its frequency. - Utilizes a
defaultdict
to keep a frequency count of all words (frequency
). temp_word
is used to construct individual words from the input text.
The method processes each character in the input text:
- If the character is alphanumeric, it appends it to
temp_word
and lowercases it for uniformity; continues to the next character unless it's the end of the text. - Once it encounters a space or a punctuation (or reaches the end of the text), it checks if
temp_word
forms a valid word, not inexcluded_set
. - If the word is valid and its frequency exceeds the
highest_freq
, it updateshighest_freq
and setsresult_word
to the current word. - Resets
temp_word
for collecting the next word.
Finally, the method returns the most frequent valid word. This solution seamlessly filters out unwanted words, maintains a count of the valid ones, and ensures that the result considers only the most frequently appearing word that isn't excluded.
No comments yet.