
Problem Statement
Chemical formulas are a concise way of expressing information about the atoms that constitute a particular chemical compound. In this task, you are given a string formula
that represents a chemical formula. Using this input, the goal is to compute and return the count of each atom contained in the formula.
The atomic elements in the formula begin with an uppercase letter followed by zero or more lowercase letters. If multiple atoms of the same element are present, they are indicated by a following number greater than 1
. If only one atom of a particular type is present, this number is implicitly 1
and is not included in the string.
Additionally, chemical formulas can be complex, involving concatenation of multiple formulas or sub-formulas enclosed in parentheses potentially followed by a number indicating multiple occurrences of that sub-formula.
The output should be a formatted string where each atom is followed by its count (if the count is more than 1
), and atoms are listed in sorted order by their names.
Examples
Example 1
Input:
formula = "H2O"
Output:
"H2O"
Explanation:
The count of elements are {'H': 2, 'O': 1}.
Example 2
Input:
formula = "Mg(OH)2"
Output:
"H2MgO2"
Explanation:
The count of elements are {'H': 2, 'Mg': 1, 'O': 2}.
Example 3
Input:
formula = "K4(ON(SO3)2)2"
Output:
"K4N2O14S4"
Explanation:
The count of elements are {'K': 4, 'N': 2, 'O': 14, 'S': 4}.
Constraints
1 <= formula.length <= 1000
formula
consists of English letters, digits,'('
, and')'
.formula
is always valid.
Approach and Intuition
The process of deciphering and counting the elements in the formula
string involves parsing the string while managing nested structures (parentheses) and concatenations. Here's a strategic approach:
Initialization:
- Use a stack to manage nested chemical formulas, particularly for formulas in parentheses. This stack will help in accumulating counts for each atomic type.
Parsing the Formula:
- Traverse the string character by character.
- When an uppercase letter is encountered, it signifies the start of an element. Capture any succeeding lowercase letters to complete the element name.
- Following the element name, check for any digits which indicate the number of atoms for that element.
- If an element doesn't have a succeeding number, default its count to
1
.
Handling Parentheses:
- On encountering an opening parenthesis
(
, push a marker (or an empty dictionary) onto the stack to indicate the start of a new sub-formula. - On encountering a closing parenthesis
)
, pop the stack until the sub-formula marker is reached, summing up counts of elements in this sub-formula. If there's a number right after)
, multiply all counts by this number to handle multiple sub-formula occurrences.
- On encountering an opening parenthesis
Building the Output:
- Upon the completion of the string traversal, you should have a complete count of all elements.
- Sort the elements by name and format them into the result string, appending the count only if it is more than
1
.
This approach ensures that even complex nested formulas are parsed and counted correctly, leveraging stack-based processing to handle the hierarchical nature of chemical formulas with nested structures.
Solutions
- C++
class Solution {
public:
string atomCounter(string expression) {
// Regular expression to match components of the chemical formula
regex expr("([A-Z][a-z]*)(\\d*)|(\\()|(\\))(\\d*)");
sregex_iterator current(expression.begin(), expression.end(), expr);
sregex_iterator end_iter;
vector<tuple<string, string, string, string, string>> elements;
while (current != end_iter) {
elements.push_back(
{(*current)[1], (*current)[2], (*current)[3], (*current)[4], (*current)[5]});
current++;
}
reverse(elements.begin(), elements.end());
// Hash map to count occurrences of each atom
unordered_map<string, int> atomMap;
// Stack to manage multiplication due to nesting
stack<int> multipliers;
multipliers.push(1);
// Current multiplication factor
int currentMult = 1;
// Processing gathered elements
for (auto& element : elements) {
string atom = get<0>(element);
string num = get<1>(element);
string openBrace = get<2>(element);
string closeBrace = get<3>(element);
string factor = get<4>(element);
// Process atoms
if (!atom.empty()) {
int atomicCount = num.empty() ? 1 : stoi(num);
atomMap[atom] += atomicCount * currentMult;
}
// Adjust multipliers on encountering a close parenthesis
else if (!closeBrace.empty()) {
int multFactor = factor.empty() ? 1 : stoi(factor);
currentMult *= multFactor;
multipliers.push(multFactor);
}
// Reset multipliers on encountering an open parenthesis
else if (!openBrace.empty()) {
currentMult /= multipliers.top();
multipliers.pop();
}
}
// Sorting map entries to generate result
map<string, int> sortedAtoms(atomMap.begin(), atomMap.end());
// Composing the result string
string result;
for (auto& [atom, count] : sortedAtoms) {
result += atom;
if (count > 1) {
result += to_string(count);
}
}
return result;
}
};
In the given problem, the objective is to parse and count the occurrences of each distinct atom in a complex chemical formula. The solution utilizes the C++ programming language and includes several vital components:
- Regular Expressions: A regular expression matches components of the input chemical expression, distinguishing atoms, numbers repeating those atoms, and parenthesis for hierarchical structures.
- Data Structures: Vectors and stacks are employed to manage atoms and their multipliers especially when dealing with nested expressions due to parentheses.
- Logic for Handling Parentheses: A stack is used to manage the scope and influence of multipliers derived from numbers succeeding closing parentheses. This stack helps in appropriately multiplying the count of atoms enclosed within the parentheses.
- Unordered Maps: These are used to keep a count of atoms as they are parsed. It enables efficient storage and retrieval which is vital for handling non-sequential and multiple occurrences of atoms.
- Sorting: Once the unordered map is populated, the data is transferred to a sorted map to arrange the atoms alphabetically as typically required in chemical formula representations.
- Result Construction: The final step involves constructing a string representing the chemical formula with the correctly counted and sorted atoms.
To utilize the solution, input the chemical formula as a string to the atomCounter
function, which will output the formatted result string showing each atom followed by its respective count (if greater than one). As the solution is crafted in C++, ensure the proper C++ environment and libraries for compilation and execution.
- Java
import java.util.regex.*;
class Solution {
public String atomCounts(String formula) {
// Creating a regex Matcher for formula processing.
Matcher matcher = Pattern.compile(
"([A-Z][a-z]*)(\\d*)|(\\()|(\\))(\\d*)"
).matcher(formula);
List<String[]> elements = new ArrayList<>();
while (matcher.find()) {
elements.add(
new String[] {
matcher.group(1),
matcher.group(2),
matcher.group(3),
matcher.group(4),
matcher.group(5),
}
);
}
Collections.reverse(elements);
// Dictionary to maintain element counts
Map<String, Integer> elementCounts = new HashMap<>();
// Stack to manage multipliers for nested structures
Stack<Integer> stack = new Stack<>();
stack.push(1);
// Variable for current multiplier
int currentMultiplier = 1;
// Iterating over reversed element list
for (String[] element : elements) {
String el = element[0];
String num = element[1];
String openBrace = element[2];
String closeBrace = element[3];
String braceMultiplier = element[4];
// Process each atom and compute counts
if (el != null) {
int qty = num.isEmpty() ? 1 : Integer.parseInt(num);
elementCounts.put(
el,
elementCounts.getOrDefault(el, 0) + qty * currentMultiplier
);
}
// Process closing parenthesis and adjust the multiplier
else if (closeBrace != null) {
int multiplier = braceMultiplier.isEmpty()
? 1
: Integer.parseInt(braceMultiplier);
currentMultiplier *= multiplier;
stack.push(multiplier);
}
// Process opening parenthesis to reset current multiplier
else if (openBrace != null) {
currentMultiplier /= stack.pop();
}
}
// Sorting the map to create the final string.
TreeMap<String, Integer> sortedElements = new TreeMap<>(elementCounts);
// Build the final atom count string.
StringBuilder result = new StringBuilder();
for (String el : sortedElements.keySet()) {
result.append(el);
if (sortedElements.get(el) > 1) {
result.append(sortedElements.get(el));
}
}
return result.toString();
}
}
The provided Java program involves parsing a chemical formula string to count the occurrences of each type of atom and output them in a standardized format. The code follows a systematic approach:
Utilize regular expressions (
java.util.regex.Matcher
andjava.util.regex.Pattern
) to identify and extract individual components of the formula, such as atom types, numbers, and groupings indicated by parentheses.Reverse the list of elements to facilitate easier multiplication of nested atomic groups from innermost to outermost.
Use a
Stack
to keep track of multipliers that arise from numbers following parentheses, which increase the atom counts inside these parentheses.Apply a
Map
(specifically aHashMap
) to store and count atoms as the formula is processed element by element. This processing involves adjusting counts based on the current multiplier, which changes as parentheses open or close.To output the results in lexicographical order, the elements of the map are transferred to a
TreeMap
which automatically sorts by the atom names as keys.Construct the final output string by iterating through the
TreeMap
. For each element, if the count is greater than 1, the count is appended to the atom name in the result string.
This efficient parsing and computational methodology ensure that the program can deal with nested chemical formulae and output a correctly formatted atomic composition, accommodating potentially complex molecular structures.
- Python
class Solution:
def atomCount(self, formula_input: str) -> str:
# Find all matching atoms, numbers, and parentheses
extraction = re.findall(r"([A-Z][a-z]*)(\d*)|(\()|(\))(\d*)", formula_input)
extraction.reverse()
# Dictionary to store element counts
atom_dict = defaultdict(int)
# Stack for managing multipliers from nested parentheses
mul_stack = [1]
# Current multiplicative factor
current_factor = 1
# Processing each extracted element or group
for element, num, open_br, close_br, group_multiplier in extraction:
if element:
num = int(num) if num else 1
atom_dict[element] += num * current_factor
elif close_br:
group_multiplier = int(group_multiplier) if group_multiplier else 1
current_factor *= group_multiplier
mul_stack.append(group_multiplier)
elif open_br:
current_factor //= mul_stack.pop()
# Sort dictionary by element names
atom_dict = dict(sorted(atom_dict.items()))
# Create output string
result = ""
for key in atom_dict:
result += key
if atom_dict[key] > 1:
result += str(atom_dict[key])
return result
The solution in Python for counting the number of atoms in a chemical formula involves parsing the formula to identify and calculate the count of each type of atom. The implementation uses regular expressions, a stack, and a dictionary to manage this task.
Key Elements of the Solution:
- Regular expressions are used to extract atoms and their counts, as well as parentheses and their associated multipliers.
- The extracted elements are stored in a reversed list to facilitate processing by the LIFO method, leveraging the stack's nature for handling nested structures in the formula.
- A dictionary (
atom_dict
) tracks the count of each type of atom. A defaultdict from thecollections
module is used, which simplifies the code by automatically handling missing keys. - A multiplication stack (
mul_stack
) helps manage multiplication factors introduced by the parentheses. It maintains the current multiplicative context as the formula is processed. - Multiplicative factors are adjusted dynamically when an open or close parenthesis is encountered, adhering to the correct formula structure.
- After processing, the dictionary is sorted by element names, and the final string is constructed by iterating over this sorted dictionary. For an element count greater than one, the count is appended to the element symbol.
The final result string represents a correctly formatted and sorted count of each atom type in the given formula. This approach ensures efficient parsing and count calculation even for complex formulas with multiple levels of nested parentheses.
No comments yet.