Python Pandas DataFrame pivot() - Create Pivot Table

Updated on December 25, 2024
pivot() header image

Introduction

Pandas is a powerhouse tool for data manipulation and analysis in Python, particularly favored in data science and finance for its ease of use and flexibility. Among its versatile set of features, the pivot() function stands out as a fundamental method for reshaping data and creating pivot tables useful for summarizing large data sets. This function helps in transforming data to a more readable and useful format, allowing for more in-depth analysis and quicker insight generation.

In this article, you learn how to effectively utilize the pivot() function to create pivot tables from a DataFrame. Explore different situations where this function becomes essential, like summarizing data, rotating data for a better view, and aggregating results under different scenarios.

Understanding Data Transformation with pivot()

Basics of the pivot() Function

Pandas pivot() function is used to create a new derived table out of a given one. The primary use of pivot() is to reshape or transform data according to given index/column values. Here’s how to start using it:

  1. Import the pandas library.

  2. Create or load an example DataFrame.

    python
    import pandas as pd
    
    data = {
        'Date': ['2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03'],
        'Category': ['Tech', 'Finance', 'Tech', 'Sports'],
        'Value': [200, 220, 240, 100]
    }
    df = pd.DataFrame(data)
    print(df)
    

    The DataFrame df consists of dates, categories, and values, simulating a typical data log scenario.

Creating a Simple Pivot Table

Once you have a DataFrame, creating a pivot table to see values distributed across categories per date becomes straightforward:

  1. Specify index, columns, and values for the pivot function.

    python
    pivot_df = df.pivot(index='Date', columns='Category', values='Value')
    print(pivot_df)
    

    This pivots the table so that each date is expanded horizontally by the category with corresponding values being populated. Unspecified combinations remain NaN.

Aggregating Data with Pivot Tables

  1. Recognize the need for aggregation if data has duplicates in the index/columns pair.

  2. Use the pivot_table() with an aggregation function to handle multiple values for a single index/column pair correctly.

    python
    pivot_table_df = df.pivot_table(index='Date', columns='Category', values='Value', aggfunc='sum')
    print(pivot_table_df)
    

    Here, the pivot_table() function is used because it supports aggregation, unlike pivot(). aggfunc='sum' will add up values in case there are duplicate entries for certain dates and categories.

Advanced Usage of pivot()

Multi-level Indexing in a Pivot Table

Multi-level indexing can enhance data presentation and allow more complex data summaries:

  1. Include additional category or identifier as part of the index.

    python
    df['Subcategory'] = ['Software', 'Banking', 'Hardware', 'Team Sports']
    pivot_multi_index_df = df.pivot_table(index=['Date', 'Subcategory'], columns='Category', values='Value')
    print(pivot_multi_index_df)
    

    This configuration helps in further breaking down the data, making insights even clearer at the more granular levels of categories.

Conclusion

Mastering the pivot() function in pandas significantly elevates data manipulation and analysis capabilities. It simplifies turning raw data into summarized tables, making it easier to observe patterns, trends, and anomalies. From simple restructurings to complex summaries, pivot() accommodates various data shapes and sizes, ensuring your analysis is comprehensive and insightful. Harness these techniques to transform data effectively and unlock deeper insights for your projects or data explorations.