Minimum Deletions to Make Character Frequencies Unique

Updated on 16 June, 2025
Minimum Deletions to Make Character Frequencies Unique header image

Problem Statement

The task is to transform any given string into a "good" string, where a "good" string is defined as one in which no two different characters share the same frequency. The frequency of a character is measured by the number of times it appears within the string. For example, in the string "aab", character 'a' repeats twice and character 'b' once. This problem requires determining the minimum number of deletions needed from the string to achieve the "good" string criteria, and returning that minimum number.

Examples

Example 1

Input:

s = "aab"

Output:

0

Explanation:

`s` is already good.

Example 2

Input:

s = "aaabbbcc"

Output:

2

Explanation:

You can delete two 'b's resulting in the good string "aaabcc".
Another way it to delete one 'b' and one 'c' resulting in the good string "aaabbc".

Example 3

Input:

s = "ceabaacb"

Output:

2

Explanation:

You can delete both 'c's resulting in the good string "eabaab".
Note that we only care about characters that are still in the string at the end (i.e. frequency of 0 is ignored).

Constraints

  • 1 <= s.length <= 105
  • s contains only lowercase English letters.

Approach and Intuition

To solve the problem of transforming a given string s into a "good" string:

  1. Calculate Frequencies:
    First, compute how many times each character appears in the string. This can be done using a frequency dictionary where each key is a character and the value is its occurrence count in s.

  2. Identify Conflicts:
    Next, to understand the conflicts, map each frequency to the number of characters that have that frequency. This helps identify which frequencies are shared by multiple characters.

  3. Resolve Conflicts:
    The main part involves adjusting the frequencies to eliminate any duplication:

    • Sort characters by their frequency.
    • For each character, if its frequency conflicts with others (i.e., at least one other character has the same frequency), decrease the frequency of that character (remove instances of that character) until no conflict remains or the frequency becomes zero.
    • Keep a count of how many deletions are made during this adjustment.
  4. Return Deletions Count:
    After resolving all frequency conflicts, the total number of deletions made will be the number required to make the string "good".

Examples

Example 1

Input:

s = "aab"

Output:

0

Explanation:

`s` is already good.

Example 2

Input:

s = "aaabbbcc"

Output:

2

Explanation:

You can delete two 'b's resulting in the good string "aaabcc".
Another way it to delete one 'b' and one 'c' resulting in the good string "aaabbc".

Example 3

Input:

s = "ceabaacb"

Output:

2

Explanation:

You can delete both 'c's resulting in the good string "eabaab".
Note that we only care about characters that are still in the string at the end (i.e. frequency of 0 is ignored).

Constraints

  • 1 <= s.length <= 105
  • s contains only lowercase English letters.

Example Cases Explanation Based on the Approach

  • Example 1: "aab"
    The frequencies are {a: 2, b: 1}. There are no shared frequencies, so the string is already "good". Therefore, no deletions are necessary (0 deletions).

  • Example 2: "aaabbbcc"
    The initial frequencies are {a: 3, b: 3, c: 2}. The characters a and b share the same frequency. By deleting two bs (or alternative deletions yielding the same result), the frequencies update to {a: 3, b: 1, c: 2} which has no conflicts. This requires 2 deletions.

  • Example 3: "ceabaacb"
    Starting with frequencies {c: 2, e: 1, a: 3, b: 2}, to resolve shared frequencies between c and b, and adjust a which conflicts no more after other adjustments, deleting two cs can be one optimal way, resulting in adjusted frequencies {c: 0, e: 1, a: 3, b: 2}. This involves 2 deletions.

The critical insight lies in efficiently handling and resolving frequency conflicts to minimize the number of deletions.

Solutions

  • C++
  • Java
  • Python
cpp
class Solution {
public:
    int minimumDeletions(string str) {
        vector<int> charFreq(26, 0);
        for (char ch : str) {
            charFreq[ch - 'a']++;
        }

        sort(charFreq.begin(), charFreq.end(), greater<int>());

        int removals = 0;
        int allowableFreq = str.length();

        for (int idx = 0; idx < 26 && charFreq[idx] > 0; idx++) {
            if (charFreq[idx] > allowableFreq) {
                removals += charFreq[idx] - allowableFreq;
                charFreq[idx] = allowableFreq;
            }
            allowableFreq = max(0, charFreq[idx] - 1);
        }

        return removals;
    }
};

The solution provided is written in C++ and is designed to address the problem of reducing the frequency of characters in a string so that no two characters have the same frequency, with the goal of achieving this by making the least number of deletions.

The implementation follows these steps:

  1. First, initialize a frequency array charFreq to store the count of each character in the input string str.
  2. Traverse the string and increment the corresponding index in the charFreq based on the character value.
  3. Sort the charFreq array in decreasing order to handle the most frequent characters first.
  4. Set removals to zero to count the number of deletions required and set allowableFreq to the length of the string, which represents the maximum frequency a character can have after sorting.
  5. Using a loop, traverse each element in the sorted frequency array. If the current frequency is greater than allowableFreq, calculate the difference, add this difference to removals, and adjust the charFreq at the current index to allowableFreq.
  6. Decrease allowableFreq by one for the next character, ensuring it does not go below zero.
  7. After the loop, return the count of removals which gives the minimum deletions needed.

This method effectively ensures that the character frequencies are unique by the end of the operation while minimizing the number of deletions required. It handles edge cases such as all characters being the same and characters with already different frequencies efficiently.

java
class Solution {
    public int minimumDeletions(String word) {
        // Array to count the frequency of each alphabet character
        int[] freq = new int[26];
        for (int idx = 0; idx < word.length(); idx++) {
            freq[word.charAt(idx) - 'a']++;
        }
        
        Arrays.sort(freq);
        
        int deletions = 0;
        // Define the highest frequency that is currently allowed
        int allowedMaxFrequency = word.length();
        
        // Loop through the sorted frequencies from highest to lowest
        for (int j = 25; j >= 0 && freq[j] > 0; j--) {
            // Ensure the frequency does not exceed the allowed maximum
            if (freq[j] > allowedMaxFrequency) {
                deletions += freq[j] - allowedMaxFrequency;
                freq[j] = allowedMaxFrequency;
            }
            // Decrease the next allowed maximum frequency
            allowedMaxFrequency = Math.max(0, freq[j] - 1);
        }
        
        return deletions;
    }     
}

This program in Java provides a solution to the problem of making character frequencies in a string unique by determining the minimum number of deletions required. The detailed workings of this solution are as follows:

  • Initialize an array freq to keep track of the frequency of each character from 'a' to 'z' within the given string.
  • Loop through the string, and for each character, update its corresponding frequency in the freq array.
  • Sort the freq array in ascending order. This helps in processing the frequencies from highest to lowest effectively.
  • Establish a variable deletions to count the total deletions required, and another variable allowedMaxFrequency set initially to the length of the string. This variable represents the maximum frequency any character can have without needing deletions at that point.
  • Iterate backwards through the freq array, from highest to lowest frequency:
    • If the current character frequency exceeds allowedMaxFrequency, calculate the excess and increment the deletions counter by this excess. Adjust the frequency of the character to allowedMaxFrequency.
    • Reduce allowedMaxFrequency by one after processing each character's frequency, ensuring that the next character's maximum allowable frequency is one less than the previous, thereby avoiding duplicate frequencies.
  • The method returns the total number of deletions required.

This algorithm ensures efficient frequency management by sorting the array and processing from the most frequent to the least, adjusting the allowable frequencies to ensure all are unique while minimizing the number of deletions.

python
class Solution:
    def minimizeDeletions(self, string: str) -> int:
        # Initialize character frequency array
        char_freq = [0] * 26
        for char in string:
            char_freq[ord(char) - ord('a')] += 1
        char_freq.sort(reverse=True)
        
        deletions_needed = 0
        # Define the upper bound for frequency of any character
        allowable_freq = len(string)
        
        # Process character frequencies 
        for current_freq in char_freq:
            if current_freq > allowable_freq:
                deletions_needed += current_freq - allowable_freq
                current_freq = allowable_freq

            # Decrease the allowable frequency each step
            allowable_freq = max(0, current_freq - 1)
        
        return deletions_needed

The solution provided is a Python function aimed at ensuring each character of a string has a unique frequency by possibly deleting some characters. The function minimizeDeletions counts the frequency of each character, sorts these frequencies in descending order, and then determines the number of deletions required. Here's a breakdown of how the solution works:

  • Initialize Frequency Count: A list char_freq of size 26 (for each letter of the alphabet) is created to keep track of how many times each character appears in the input string.

  • Sort Frequencies: The list char_freq is sorted in reverse order to handle the highest frequencies first, which are more likely to conflict.

  • Calculate Deletions: The function iterates over each frequency in the sorted list. For each frequency, if it's higher than the allowable_freq (initially set to the length of the string), it calculates how many deletions are needed to make this frequency unique and reduces it to the allowable limit. After handling one frequency, it ensures that the next highest allowable frequency is either zero or one less than the current adjusted frequency.

  • Return Results: Finally, the total number of deletions needed to achieve the goal of unique frequencies is returned.

This approach ensures efficient management of character frequencies and minimizes the number of deletions necessary by addressing the most common characters first.

Comments

No comments yet.