Python Pandas DataFrame iterrows() - Iterate Over Rows

Updated on December 25, 2024
iterrows() header image

Introduction

The iterrows() function in Python's Pandas library is a generator that iterates over DataFrame rows, returning each row's index and a Series holding the data. This method is essential for scenarios where row-wise operations are necessary, such as conditional checks, aggregations, and transformations based on specific row values. Although not the fastest method available due to its return type and inherent loop structure, iterrows() remains popular for its ease of use and readability in handling moderate-sized data.

In this article, you will learn how to proficiently utilize the iterrows() function to manipulate and extract information from DataFrame rows. Explore practical examples that demonstrate row-wise operations, and grasp how to make the most of this method for data analysis tasks.

Understanding iterrows() Basics

A Simple Row Iteration

  1. Import the Pandas library and create a DataFrame.

  2. Iterate over the rows using iterrows().

    python
    import pandas as pd
    
    data = {'Name': ['Alice', 'Bob', 'Charles'],
            'Age': [25, 30, 35],
            'City': ['New York', 'Paris', 'London']}
    df = pd.DataFrame(data)
    
    for index, row in df.iterrows():
        print(f"Index: {index}, Name: {row['Name']}, Age: {row['Age']}, City: {row['City']}")
    

    Each iteration returns the index of the row and the row data as a Pandas Series, enabling straightforward access to each cell by column names like row['Name'].

Modifying Data Within a Loop

  1. Use iterrows() to modify data inline (although this method isn't recommended for large-scale modifications).

  2. Assign modified data to a new DataFrame column or update existing columns.

    python
    for index, row in df.iterrows():
        df.at[index, 'Age'] = row['Age'] + 10  # Increasing age by 10
    
    print(df)
    

    Here, df.at[index, 'ColumnName'] is used to update the DataFrame directly. While modifying data using iterrows() can work, vectorized operations or apply() are preferable for performance.

Advanced Usage of iterrows()

Filtering Rows

  1. Use iterrows() to filter data based on complex conditions that might be cumbersome with standard filtering methods.

  2. Append matching rows to a new DataFrame.

    python
    filtered_df = pd.DataFrame(columns=df.columns)
    
    for index, row in df.iterrows():
        if row['Age'] > 30 and 'New' in row['City']:
            filtered_df = filtered_df.append(row)
    
    print(filtered_df)
    

    This snippet filters rows where the age is over 30 and the city contains the word "New". The append() method adds the qualifying rows to filtered_df.

Calculating Aggregate Values

  1. Implement iterrows() to perform custom aggregations.

  2. Accumulate results from individual rows to compute summaries.

    python
    total_age = 0
    count = 0
    
    for index, row in df.iterrows():
        total_age += row['Age']
        count += 1
    
    average_age = total_age / count if count > 0 else None
    print(f"Average Age: {average_age}")
    

    Calculate the average age by summing up ages and counting rows, then performing the division. This method gives flexibility over the aggregation logic, which might be useful in more complex scenarios.

Conclusion

iterrows() in Pandas is a versatile tool for iterating over DataFrame rows, suitable for a range of row-wise operations. Although it's not the most performant method for large datasets, its simplicity and clear syntax make it a valuable technique for data manipulations where vectorized operations are not feasible. Harness the examples provided to refine data handling processes and achieve nuanced control over row-wise data analysis tasks in Python. By now, you should be able to apply iterrows() effectively in your data projects, ensuring precise and tailored data analysis outputs.