Number of Atoms

Updated on 07 July, 2025
Number of Atoms header image

Problem Statement

Chemical formulas are a concise way of expressing information about the atoms that constitute a particular chemical compound. In this task, you are given a string formula that represents a chemical formula. Using this input, the goal is to compute and return the count of each atom contained in the formula.

The atomic elements in the formula begin with an uppercase letter followed by zero or more lowercase letters. If multiple atoms of the same element are present, they are indicated by a following number greater than 1. If only one atom of a particular type is present, this number is implicitly 1 and is not included in the string.

Additionally, chemical formulas can be complex, involving concatenation of multiple formulas or sub-formulas enclosed in parentheses potentially followed by a number indicating multiple occurrences of that sub-formula.

The output should be a formatted string where each atom is followed by its count (if the count is more than 1), and atoms are listed in sorted order by their names.

Examples

Example 1

Input:

formula = "H2O"

Output:

"H2O"

Explanation:

The count of elements are {'H': 2, 'O': 1}.

Example 2

Input:

formula = "Mg(OH)2"

Output:

"H2MgO2"

Explanation:

The count of elements are {'H': 2, 'Mg': 1, 'O': 2}.

Example 3

Input:

formula = "K4(ON(SO3)2)2"

Output:

"K4N2O14S4"

Explanation:

The count of elements are {'K': 4, 'N': 2, 'O': 14, 'S': 4}.

Constraints

  • 1 <= formula.length <= 1000
  • formula consists of English letters, digits, '(', and ')'.
  • formula is always valid.

Approach and Intuition

The process of deciphering and counting the elements in the formula string involves parsing the string while managing nested structures (parentheses) and concatenations. Here's a strategic approach:

  1. Initialization:

    • Use a stack to manage nested chemical formulas, particularly for formulas in parentheses. This stack will help in accumulating counts for each atomic type.
  2. Parsing the Formula:

    • Traverse the string character by character.
    • When an uppercase letter is encountered, it signifies the start of an element. Capture any succeeding lowercase letters to complete the element name.
    • Following the element name, check for any digits which indicate the number of atoms for that element.
    • If an element doesn't have a succeeding number, default its count to 1.
  3. Handling Parentheses:

    • On encountering an opening parenthesis (, push a marker (or an empty dictionary) onto the stack to indicate the start of a new sub-formula.
    • On encountering a closing parenthesis ), pop the stack until the sub-formula marker is reached, summing up counts of elements in this sub-formula. If there's a number right after ), multiply all counts by this number to handle multiple sub-formula occurrences.
  4. Building the Output:

    • Upon the completion of the string traversal, you should have a complete count of all elements.
    • Sort the elements by name and format them into the result string, appending the count only if it is more than 1.

This approach ensures that even complex nested formulas are parsed and counted correctly, leveraging stack-based processing to handle the hierarchical nature of chemical formulas with nested structures.

Solutions

  • C++
cpp
class Solution {
public:
    string atomCounter(string expression) {
        // Regular expression to match components of the chemical formula
        regex expr("([A-Z][a-z]*)(\\d*)|(\\()|(\\))(\\d*)");
        sregex_iterator current(expression.begin(), expression.end(), expr);
        sregex_iterator end_iter;
        vector<tuple<string, string, string, string, string>> elements;
        while (current != end_iter) {
            elements.push_back(
                {(*current)[1], (*current)[2], (*current)[3], (*current)[4], (*current)[5]});
            current++;
        }
        reverse(elements.begin(), elements.end());
    
        // Hash map to count occurrences of each atom
        unordered_map<string, int> atomMap;
    
        // Stack to manage multiplication due to nesting
        stack<int> multipliers;
        multipliers.push(1);
    
        // Current multiplication factor
        int currentMult = 1;
    
        // Processing gathered elements
        for (auto& element : elements) {
            string atom = get<0>(element);
            string num = get<1>(element);
            string openBrace = get<2>(element);
            string closeBrace = get<3>(element);
            string factor = get<4>(element);
    
            // Process atoms
            if (!atom.empty()) {
                int atomicCount = num.empty() ? 1 : stoi(num);
                atomMap[atom] += atomicCount * currentMult;
            }
    
            // Adjust multipliers on encountering a close parenthesis
            else if (!closeBrace.empty()) {
                int multFactor = factor.empty() ? 1 : stoi(factor);
                currentMult *= multFactor;
                multipliers.push(multFactor);
            }
    
            // Reset multipliers on encountering an open parenthesis
            else if (!openBrace.empty()) {
                currentMult /= multipliers.top();
                multipliers.pop();
            }
        }
    
        // Sorting map entries to generate result
        map<string, int> sortedAtoms(atomMap.begin(), atomMap.end());
    
        // Composing the result string
        string result;
        for (auto& [atom, count] : sortedAtoms) {
            result += atom;
            if (count > 1) {
                result += to_string(count);
            }
        }
    
        return result;
    }
};

In the given problem, the objective is to parse and count the occurrences of each distinct atom in a complex chemical formula. The solution utilizes the C++ programming language and includes several vital components:

  • Regular Expressions: A regular expression matches components of the input chemical expression, distinguishing atoms, numbers repeating those atoms, and parenthesis for hierarchical structures.
  • Data Structures: Vectors and stacks are employed to manage atoms and their multipliers especially when dealing with nested expressions due to parentheses.
  • Logic for Handling Parentheses: A stack is used to manage the scope and influence of multipliers derived from numbers succeeding closing parentheses. This stack helps in appropriately multiplying the count of atoms enclosed within the parentheses.
  • Unordered Maps: These are used to keep a count of atoms as they are parsed. It enables efficient storage and retrieval which is vital for handling non-sequential and multiple occurrences of atoms.
  • Sorting: Once the unordered map is populated, the data is transferred to a sorted map to arrange the atoms alphabetically as typically required in chemical formula representations.
  • Result Construction: The final step involves constructing a string representing the chemical formula with the correctly counted and sorted atoms.

To utilize the solution, input the chemical formula as a string to the atomCounter function, which will output the formatted result string showing each atom followed by its respective count (if greater than one). As the solution is crafted in C++, ensure the proper C++ environment and libraries for compilation and execution.

  • Java
java
import java.util.regex.*;
    
class Solution {
    
    public String atomCounts(String formula) {
        // Creating a regex Matcher for formula processing.
        Matcher matcher = Pattern.compile(
            "([A-Z][a-z]*)(\\d*)|(\\()|(\\))(\\d*)"
        ).matcher(formula);
        List<String[]> elements = new ArrayList<>();
        while (matcher.find()) {
            elements.add(
                new String[] {
                    matcher.group(1),
                    matcher.group(2),
                    matcher.group(3),
                    matcher.group(4),
                    matcher.group(5),
                }
            );
        }
        Collections.reverse(elements);
    
        // Dictionary to maintain element counts
        Map<String, Integer> elementCounts = new HashMap<>();
    
        // Stack to manage multipliers for nested structures
        Stack<Integer> stack = new Stack<>();
        stack.push(1);
    
        // Variable for current multiplier
        int currentMultiplier = 1;
    
        // Iterating over reversed element list
        for (String[] element : elements) {
            String el = element[0];
            String num = element[1];
            String openBrace = element[2];
            String closeBrace = element[3];
            String braceMultiplier = element[4];
    
            // Process each atom and compute counts
            if (el != null) {
                int qty = num.isEmpty() ? 1 : Integer.parseInt(num);
                elementCounts.put(
                    el,
                    elementCounts.getOrDefault(el, 0) + qty * currentMultiplier
                );
            }
            // Process closing parenthesis and adjust the multiplier
            else if (closeBrace != null) {
                int multiplier = braceMultiplier.isEmpty()
                    ? 1
                    : Integer.parseInt(braceMultiplier);
                currentMultiplier *= multiplier;
                stack.push(multiplier);
            }
            // Process opening parenthesis to reset current multiplier
            else if (openBrace != null) {
                currentMultiplier /= stack.pop();
            }
        }
    
        // Sorting the map to create the final string.
        TreeMap<String, Integer> sortedElements = new TreeMap<>(elementCounts);
    
        // Build the final atom count string.
        StringBuilder result = new StringBuilder();
        for (String el : sortedElements.keySet()) {
            result.append(el);
            if (sortedElements.get(el) > 1) {
                result.append(sortedElements.get(el));
            }
        }
    
        return result.toString();
    }
}

The provided Java program involves parsing a chemical formula string to count the occurrences of each type of atom and output them in a standardized format. The code follows a systematic approach:

  • Utilize regular expressions (java.util.regex.Matcher and java.util.regex.Pattern) to identify and extract individual components of the formula, such as atom types, numbers, and groupings indicated by parentheses.

  • Reverse the list of elements to facilitate easier multiplication of nested atomic groups from innermost to outermost.

  • Use a Stack to keep track of multipliers that arise from numbers following parentheses, which increase the atom counts inside these parentheses.

  • Apply a Map (specifically a HashMap) to store and count atoms as the formula is processed element by element. This processing involves adjusting counts based on the current multiplier, which changes as parentheses open or close.

  • To output the results in lexicographical order, the elements of the map are transferred to a TreeMap which automatically sorts by the atom names as keys.

  • Construct the final output string by iterating through the TreeMap. For each element, if the count is greater than 1, the count is appended to the atom name in the result string.

This efficient parsing and computational methodology ensure that the program can deal with nested chemical formulae and output a correctly formatted atomic composition, accommodating potentially complex molecular structures.

  • Python
python
class Solution:
    def atomCount(self, formula_input: str) -> str:
        # Find all matching atoms, numbers, and parentheses
        extraction = re.findall(r"([A-Z][a-z]*)(\d*)|(\()|(\))(\d*)", formula_input)
        extraction.reverse()
    
        # Dictionary to store element counts
        atom_dict = defaultdict(int)
    
        # Stack for managing multipliers from nested parentheses
        mul_stack = [1]
    
        # Current multiplicative factor
        current_factor = 1
    
        # Processing each extracted element or group
        for element, num, open_br, close_br, group_multiplier in extraction:
            if element:
                num = int(num) if num else 1
                atom_dict[element] += num * current_factor
            elif close_br:
                group_multiplier = int(group_multiplier) if group_multiplier else 1
                current_factor *= group_multiplier
                mul_stack.append(group_multiplier)
            elif open_br:
                current_factor //= mul_stack.pop()
    
        # Sort dictionary by element names
        atom_dict = dict(sorted(atom_dict.items()))
    
        # Create output string
        result = ""
        for key in atom_dict:
            result += key
            if atom_dict[key] > 1:
                result += str(atom_dict[key])
    
        return result

The solution in Python for counting the number of atoms in a chemical formula involves parsing the formula to identify and calculate the count of each type of atom. The implementation uses regular expressions, a stack, and a dictionary to manage this task.

Key Elements of the Solution:

  • Regular expressions are used to extract atoms and their counts, as well as parentheses and their associated multipliers.
  • The extracted elements are stored in a reversed list to facilitate processing by the LIFO method, leveraging the stack's nature for handling nested structures in the formula.
  • A dictionary (atom_dict) tracks the count of each type of atom. A defaultdict from the collections module is used, which simplifies the code by automatically handling missing keys.
  • A multiplication stack (mul_stack) helps manage multiplication factors introduced by the parentheses. It maintains the current multiplicative context as the formula is processed.
  • Multiplicative factors are adjusted dynamically when an open or close parenthesis is encountered, adhering to the correct formula structure.
  • After processing, the dictionary is sorted by element names, and the final string is constructed by iterating over this sorted dictionary. For an element count greater than one, the count is appended to the element symbol.

The final result string represents a correctly formatted and sorted count of each atom type in the given formula. This approach ensures efficient parsing and count calculation even for complex formulas with multiple levels of nested parentheses.

Comments

No comments yet.