Python Pandas DataFrame iterrows() - Iterate Over Rows

Introduction

The iterrows() function in Python's Pandas library is a generator that iterates over DataFrame rows, returning each row's index and a Series holding the data. This method is essential for scenarios where row-wise operations are necessary, such as conditional checks, aggregations, and transformations based on specific row values. Although not the fastest method available due to its return type and inherent loop structure, iterrows() remains popular for its ease of use and readability in handling moderate-sized data.

In this article, you will learn how to proficiently utilize the iterrows() function to manipulate and extract information from DataFrame rows. Explore practical examples that demonstrate row-wise operations, and grasp how to make the most of this method for data analysis tasks.

Understanding iterrows() Basics

A Simple Row Iteration

Import the Pandas library and create a DataFrame.

Iterate over the rows using iterrows().

                            python
                            
                        
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charles'],
        'Age': [25, 30, 35],
        'City': ['New York', 'Paris', 'London']}
df = pd.DataFrame(data)

for index, row in df.iterrows():
    print(f"Index: {index}, Name: {row['Name']}, Age: {row['Age']}, City: {row['City']}")

Each iteration returns the index of the row and the row data as a Pandas Series, enabling straightforward access to each cell by column names like row['Name'].

Modifying Data Within a Loop

Use iterrows() to modify data inline (although this method isn't recommended for large-scale modifications).
Assign modified data to a new DataFrame column or update existing columns.
python
```
for index, row in df.iterrows():
    df.at[index, 'Age'] = row['Age'] + 10  # Increasing age by 10

print(df)
```
Here, df.at[index, 'ColumnName'] is used to update the DataFrame directly. While modifying data using iterrows() can work, vectorized operations or apply() are preferable for performance.

Advanced Usage of iterrows()

Filtering Rows

Use iterrows() to filter data based on complex conditions that might be cumbersome with standard filtering methods.

Append matching rows to a new DataFrame.

                            python
                            
                        
filtered_df = pd.DataFrame(columns=df.columns)

for index, row in df.iterrows():
    if row['Age'] > 30 and 'New' in row['City']:
        filtered_df = filtered_df.append(row)

print(filtered_df)

This snippet filters rows where the age is over 30 and the city contains the word "New". The append() method adds the qualifying rows to filtered_df.

Calculating Aggregate Values

Implement iterrows() to perform custom aggregations.
Accumulate results from individual rows to compute summaries.
python
```
total_age = 0
count = 0

for index, row in df.iterrows():
    total_age += row['Age']
    count += 1

average_age = total_age / count if count > 0 else None
print(f"Average Age: {average_age}")
```
Calculate the average age by summing up ages and counting rows, then performing the division. This method gives flexibility over the aggregation logic, which might be useful in more complex scenarios.

Conclusion

iterrows() in Pandas is a versatile tool for iterating over DataFrame rows, suitable for a range of row-wise operations. Although it's not the most performant method for large datasets, its simplicity and clear syntax make it a valuable technique for data manipulations where vectorized operations are not feasible. Harness the examples provided to refine data handling processes and achieve nuanced control over row-wise data analysis tasks in Python. By now, you should be able to apply iterrows() effectively in your data projects, ensuring precise and tailored data analysis outputs.

Comments

No comments yet.

Python Pandas DataFrame iterrows() - Iterate Over Rows

Introduction

Understanding iterrows() Basics

A Simple Row Iteration

Modifying Data Within a Loop

Advanced Usage of iterrows()

Filtering Rows

Calculating Aggregate Values

Conclusion

Comments

Products

Features

Solutions

Marketplace

Resources

Company

Tech Talks

Vultr Blogs