Group Anagrams

Updated on 02 June, 2025
Group Anagrams header image

Problem Statement

The task is to take an array of strings named strs, and categorize each string with its anagram counterparts into sub-arrays. An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once. The resulting array should include these sub-arrays, each containing groups of anagrams. The order in which these sub-arrays or strings within them appear is not specific, allowing any sequence in the output.

Examples

Example 1

Input:

strs = ["eat","tea","tan","ate","nat","bat"]

Output:

[["bat"],["nat","tan"],["ate","eat","tea"]]

Explanation:

* There is no string in strs that can be rearranged to form `"bat"`.
* The strings `"nat"` and `"tan"` are anagrams as they can be rearranged to form each other.
* The strings `"ate"`, `"eat"`, and `"tea"` are anagrams as they can be rearranged to form each other.

Example 2

Input:

strs = [""]

Output:

[[""]]

Example 3

Input:

strs = ["a"]

Output:

[["a"]]

Constraints

  • 1 <= strs.length <= 104
  • 0 <= strs[i].length <= 100
  • strs[i] consists of lowercase English letters.

Approach and Intuition

To solve the problem of grouping anagrams, we follow these steps:

  1. Create a dictionary to hold each group of anagrams.

    • Here, the key will be the sorted version of each string (which acts as the identifier for each group of anagrams), and the value will be a list of strings that are anagrams.
  2. Iterate through each string in the strs array:

    • Sort the characters of the string to form a key.
    • If this sorted key does not exist in the dictionary, initialize it with an empty list.
    • Append the current string to the list associated with the sorted key.
  3. After organizing all the strings into the dictionary, simply retrieve the lists (or values) from the dictionary, as these constitute the groups of anagrams.

Example Insights:

  • For the first example with multiple strings, once we sort the strings, strings like "eat", "tea", and "ate" will all have a sorted key of "aet", hence they are grouped together.
  • In cases where the list includes single or empty strings like the latter examples, each either stands alone or groups only with exact copies. This is because their sorted form is identical to the original string.

These principles follow the constraints that strings consist of lowercase English letters, and can vary in length, but efficiently handles grouping by utilizing dictionary data structures and string sorting.

Solutions

  • C++
  • Java
  • C
  • JavaScript
  • Python
cpp
class Solution {
public:
    vector<vector<string>> groupAnagrams(vector<string>& strs) {
        if (strs.empty()) return {};
        unordered_map<string, vector<string>> anagramGroups;
        int alphabetCounts[26];
        for (string& word : strs) {
            fill(begin(alphabetCounts), end(alphabetCounts), 0);
            for (char ch : word) alphabetCounts[ch - 'a']++;
            string signature = "";
            for (int i = 0; i < 26; i++) {
                signature += "#";
                signature += to_string(alphabetCounts[i]);
            }
            anagramGroups[signature].push_back(word);
        }
        vector<vector<string>> output;
        for (auto& group : anagramGroups)
            output.push_back(group.second);
        return output;
    }
};

In the "Group Anagrams" problem, the solution entails organizing a list of strings into groups where each group contains strings that are anagrams of each other. Achieve this by implementing a function in C++ to manage the process.

  • Start by checking if the input list, strs, is empty. Return an empty list immediately if true.
  • Utilize a hash map, anagramGroups, to map a string signature to its corresponding group of anagrams.
  • Utilize an array, alphabetCounts, initialized to zero to count occurrences of each alphabet character in words.
  • For every word in the input list:
    • Reset alphabetCounts.
    • Increment counts in alphabetCounts for each character in the word.
    • Construct a unique string signature that encodes the frequency of each alphabet letter.
    • Append the word to its corresponding group in anagramGroups identified by the unique signature.
  • Initialize a vector, output, to hold the final grouped anagrams.
  • Extract and append each group from anagramGroups to output.
  • Return the output vector which now contains the groups of anagrams.

This approach ensures effective grouping using a frequency count method and a hash map to manage groups dynamically, making it both robust and efficient.

java
class Solution {
    public List<List<String>> collectAnagrams(String[] input) {
        if (input.length == 0) return new ArrayList<>();
        Map<String, List<String>> groupAnags = new HashMap<>();
        int[] freq = new int[26];
        for (String str : input) {
            Arrays.fill(freq, 0);
            for (char ch : str.toCharArray()) freq[ch - 'a']++;
            
            StringBuilder keyBuilder = new StringBuilder();
            for (int j = 0; j < 26; j++) {
                keyBuilder.append('#');
                keyBuilder.append(freq[j]);
            }
            String key = keyBuilder.toString();
            if (!groupAnags.containsKey(key)) groupAnags.put(key, new ArrayList<>());
            groupAnags.get(key).add(str);
        }
        return new ArrayList<>(groupAnags.values());
    }
}

The Java solution provided addresses the problem of grouping anagrams from an array of strings. The method collectAnagrams takes an array of strings as input and returns a list of lists, where each list contains strings that are anagrams of each other.

  • Start by checking if the input array is empty. If it is, return an empty list immediately.
  • Utilize a HashMap to map a string key to a list of anagrams. This key is generated based on the frequency of each letter in the string.
  • An integer array of size 26 is used to store the frequency of each alphabet letter in the current string being evaluated.
  • Traverse through each string in the input array:
    • Reset the frequency array to zero before processing each string.
    • Populate the frequency array by iterating through each character in the string.
    • Build a unique string key that represents the character frequencies (e.g., "#2#3" for two 'a's and three 'b's).
    • If this key isn't already in the HashMap, add it with an empty list as the value.
    • Append the current string to the list corresponding to the generated key in the HashMap.
  • Finally, convert and return the values of the HashMap as a new ArrayList.

This method efficiently groups anagrams by using a character frequency count approach, ensuring that strings that are anagrams will have the identical key when their characters' frequencies are represented as a string. Thus, anagrams are grouped together in the same list.

c
// Definition of a node to store strings (anagrams).
typedef struct string_list_node {
    char *string_value;
    struct string_list_node *next_string;
} string_list_node;

// Structure of a hash table.
typedef struct anagram_hash_table {
    char anagram_key[104];              // Key representing anagram groups
    string_list_node *first_node;       // Head of linked list storing anagrams
    int number_of_strings;              // Count of strings in the list
    UT_hash_handle hash_handle;         // Enables the structure to be used in a hash table
} anagram_hash_table;

// Creates a new linked list node for anagrams.
string_list_node *create_new_node(char *input_str) {
    string_list_node *new_node = (string_list_node *)malloc(sizeof(string_list_node));
    new_node->string_value = strdup(input_str);
    new_node->next_string = NULL;
    return new_node;
}

// Adds a string to the hash table.
void add_to_hash(anagram_hash_table **hash, char *anagram_key, char *string_to_add) {
    anagram_hash_table *element;

    HASH_FIND_STR(*hash, anagram_key, element);
    if (element == NULL) {
        element = (anagram_hash_table *)malloc(sizeof(anagram_hash_table));
        strcpy(element->anagram_key, anagram_key);
        element->first_node = create_new_node(string_to_add);
        element->number_of_strings = 1;
        HASH_ADD_STR(*hash, anagram_key, element);
    } else {
        string_list_node *new_list_node = create_new_node(string_to_add);
        new_list_node->next_string = element->first_node;
        element->first_node = new_list_node;
        element->number_of_strings++;
    }
}

// Function to compute the hash key for an anagram group.
void generate_key(char *text, char *output_key) {
    int letter_counts[26] = {0};
    while (*text) {
        letter_counts[*text - 'a']++;
        text++;
    }
    output_key[0] = '\0';
    for (int i = 0; i < 26; i++) {
        char temp[10];
        sprintf(temp, "#%d", letter_counts[i]);
        strcat(output_key, temp);
    }
}

// Main function to group anagrams
char ***groupAnagrams(char **inputs, int size_of_inputs, int *group_count,
                      int **group_sizes) {
    anagram_hash_table *hash_table = NULL, *current_item, *temp_item;
    char ***organized_groups;

    for (int i = 0; i < size_of_inputs; i++) {
        char computed_key[104] = {0};
        generate_key(inputs[i], computed_key);
        add_to_hash(&hash_table, computed_key, inputs[i]);
    }

    *group_count = HASH_COUNT(hash_table);
    organized_groups = (char ***)malloc((*group_count) * sizeof(char **));
    *group_sizes = (int *)malloc((*group_count) * sizeof(int));

    int index = 0;
    HASH_ITER(hh, hash_table, current_item, temp_item) {
        organized_groups[index] = (char **)malloc(current_item->number_of_strings * sizeof(char *));
        (*group_sizes)[index] = current_item->number_of_strings;
        int internal_index = 0;
        for (string_list_node *current_node = current_item->first_node; current_node != NULL; current_node = current_node->next_string) {
            organized_groups[index][internal_index++] = current_node->string_value;
        }
        index++;
    }

    // Clean-up of hash table.
    HASH_ITER(hh, hash_table, current_item, temp_item) {
        HASH_DEL(hash_table, current_item);
        string_list_node *iter_node = current_item->first_node, *next_node;
        while (iter_node != NULL) {
            next_node = iter_node->next_string;
            free(iter_node);  // Free the node but not the string.
            iter_node = next_node;
        }
        free(current_item);
    }

    return organized_groups;
}

The provided C code demonstrates how to group strings into anagrams using a hash table. Here's a breakdown of the process:

  • Data Structures Used: The code employs two main types of data structures:

    • string_list_node: A linked list node to maintain strings that share the same anagram group.
    • anagram_hash_table: A hash table to organize these anagrams. Each entry contains a key representing the sorted characters of an anagram, a linked list of actual strings, a count of these strings, and a hash table handle for integration with the UTHash library.
  • Creating and Adding Nodes:

    • create_new_node: Helper function to create a new node for the linked list.
    • add_to_hash: Inserts a new string to the anagram hash table. If the key already exists, it appends to the linked list associated with that key; otherwise, it creates a new hash table entry.
  • Key Generation:

    • generate_key: Generates a unique key for each string based on the frequency of each character. This key helps identify which strings are anagrams of each other.
  • Grouping Anagrams:

    • In the groupAnagrams function:
      1. Computes a key for each input string and uses add_to_hash to add strings to the correct anagram group.
      2. After organizing, it formats the results into a 3D array (organized_groups) where each group is a 2D array of strings.
      3. Finally, performs cleanup by deallocating used memory.

This approach efficiently groups anagrams by using the frequency of characters as a map through the use of a hash table, allowing for fast look-up and insert operations. This is crucial for effectively handling a potentially large list of strings. Memory management is also handled via node and hash table clean-up to ensure no memory leaks.

js
var categorizeAnagrams = function(strings) {
    if (strings.length === 0) return [];
    let result = {};
    for (let str of strings) {
        let counts = Array(26).fill(0);
        for (let char of str) counts[char.charCodeAt(0) - "a".charCodeAt(0)]++;
        let key = "";
        for (let i = 0; i < 26; i++) {
            key += "#";
            key += counts[i];
        }
        if (!result[key]) result[key] = [];
        result[key].push(str);
    }
    return Object.values(result);
};

The solution provided in JavaScript effectively groups strings that are anagrams of each other. Here's a breakdown of how this function accomplishes that:

  1. Initialize a check for an empty input array. If the array is empty, return an empty array immediately.

  2. Create a dictionary named result to hold the groups of anagrams.

  3. Loop through each string in the input array:

    • Initialize a frequency count for each letter in the alphabet, using an array of 26 zeros.
    • Loop through each character in the string, updating the frequency count based on the alphabetical position of each character.
    • Create a unique key for the string from the frequency count array. The key consists of a series of numbers separated by hash symbols, where each number represents the frequency of the corresponding letter in the string.
    • If this key hasn't been used in the dictionary, add it with an initial array containing the current string.
    • If the key exists, append the current string to the array associated with that key.
  4. Once all strings are processed, extract the values from the result dictionary. Each value is an array of strings that are anagrams of each other.

This approach ensures efficient categorization of the strings, leveraging character counts as hashable keys to group anagrams without needing to sort. This largely optimizes the function given that frequency counts are a constant-time computation relative to the length of the alphabet.

python
class Solution:
    def findAnagrams(self, words: List[str]) -> List[List[str]]:
        grouped = collections.defaultdict(list)
        for word in words:
            letter_count = [0] * 26
            for letter in word:
                letter_count[ord(letter) - ord('a')] += 1
            grouped[tuple(letter_count)].append(word)
        return list(grouped.values())

This solution effectively groups anagrams from a list of words using Python3. Utilizing the collections.defaultdict, it categorizes each word by its character count where characters are represented in a fixed array of length 26, representing the alphabet ('a' to 'z'). Each character's count in a word is updated by iterating through the word and adjusting the relevant index in the array. This array is then converted into a tuple to serve as a unique key for the defaultdict, ensuring that all anagrams share the same key and are thus grouped together. The function returns the values of the dictionary, which are lists of groups of anagrams.

Comments

No comments yet.