The iterrows()
function in Python's Pandas library is a generator that iterates over DataFrame rows, returning each row's index and a Series holding the data. This method is essential for scenarios where row-wise operations are necessary, such as conditional checks, aggregations, and transformations based on specific row values. Although not the fastest method available due to its return type and inherent loop structure, iterrows()
remains popular for its ease of use and readability in handling moderate-sized data.
In this article, you will learn how to proficiently utilize the iterrows()
function to manipulate and extract information from DataFrame rows. Explore practical examples that demonstrate row-wise operations, and grasp how to make the most of this method for data analysis tasks.
Import the Pandas library and create a DataFrame.
Iterate over the rows using iterrows()
.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charles'],
'Age': [25, 30, 35],
'City': ['New York', 'Paris', 'London']}
df = pd.DataFrame(data)
for index, row in df.iterrows():
print(f"Index: {index}, Name: {row['Name']}, Age: {row['Age']}, City: {row['City']}")
Each iteration returns the index of the row and the row data as a Pandas Series, enabling straightforward access to each cell by column names like row['Name']
.
Use iterrows()
to modify data inline (although this method isn't recommended for large-scale modifications).
Assign modified data to a new DataFrame column or update existing columns.
for index, row in df.iterrows():
df.at[index, 'Age'] = row['Age'] + 10 # Increasing age by 10
print(df)
Here, df.at[index, 'ColumnName']
is used to update the DataFrame directly. While modifying data using iterrows()
can work, vectorized operations or apply()
are preferable for performance.
Use iterrows()
to filter data based on complex conditions that might be cumbersome with standard filtering methods.
Append matching rows to a new DataFrame.
filtered_df = pd.DataFrame(columns=df.columns)
for index, row in df.iterrows():
if row['Age'] > 30 and 'New' in row['City']:
filtered_df = filtered_df.append(row)
print(filtered_df)
This snippet filters rows where the age is over 30 and the city contains the word "New". The append()
method adds the qualifying rows to filtered_df
.
Implement iterrows()
to perform custom aggregations.
Accumulate results from individual rows to compute summaries.
total_age = 0
count = 0
for index, row in df.iterrows():
total_age += row['Age']
count += 1
average_age = total_age / count if count > 0 else None
print(f"Average Age: {average_age}")
Calculate the average age by summing up ages and counting rows, then performing the division. This method gives flexibility over the aggregation logic, which might be useful in more complex scenarios.
iterrows()
in Pandas is a versatile tool for iterating over DataFrame rows, suitable for a range of row-wise operations. Although it's not the most performant method for large datasets, its simplicity and clear syntax make it a valuable technique for data manipulations where vectorized operations are not feasible. Harness the examples provided to refine data handling processes and achieve nuanced control over row-wise data analysis tasks in Python. By now, you should be able to apply iterrows()
effectively in your data projects, ensuring precise and tailored data analysis outputs.