Python Pandas DataFrame hist() - Plot Histogram

Updated on January 1, 2025
hist() header image

Introduction

The hist() function in Python's Pandas library is a versatile tool for creating histograms, which are essential for the visual exploration of data distributions. Histograms help in understanding the underlying frequency distribution (e.g., normal distribution), outliers, skewness, etc., of a dataset. This function makes it straightforward to generate histograms directly from DataFrame columns, facilitating quick data analysis.

In this article, you will learn how to use the hist() function within Pandas to plot histograms. You will explore various customizations and configurations to tailor the histogram to specific analysis needs, handling different types of data, and enhancing the visual appeal of your plots.

Basics of Using hist()

Plotting a Simple Histogram

  1. Import the necessary libraries.

  2. Create a DataFrame.

  3. Call the hist() function on the DataFrame column.

    python
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    
    # Create a DataFrame
    df = pd.DataFrame({
        'data': np.random.randn(1000)
    })
    
    # Plot histogram
    df['data'].hist()
    plt.show()
    

    In this example, a DataFrame is created with 1000 normally distributed random numbers. The hist() method plots the histogram of the 'data' column.

Adjusting the Number of Bins

  1. Bins specify how many intervals (bars) the data range is divided into.

  2. Use the bins parameter to modify it.

    python
    df['data'].hist(bins=30)
    plt.show()
    

    Adjusting the number of bins can help in getting a more refined or a broader view of the data distribution. Here, the data is divided into 30 bins.

Customizing Histograms

Changing Plot Style and Color

  1. The hist() function provides flexibility in styling.

  2. Modify parameters like color and grid.

    python
    df['data'].hist(color='blue', grid=False)
    plt.show()
    

    This code snippet styles the histogram with a blue color and disables the grid.

Adding Titles and Labels

  1. Enhance readability with titles and axis labels.

    python
    ax = df['data'].hist(bins=20)
    ax.set_title('Data Distribution')
    ax.set_xlabel('Values')
    ax.set_ylabel('Frequency')
    plt.show()
    

    Titles and labels are crucial for making the histograms self-explanatory. This will add a title, and labels for both the x-axis and y-axis.

Plot Multiple Histograms Together

Histograms for Multiple Columns

  1. Create a DataFrame with multiple numeric columns.

  2. Use hist() to plot histograms for all columns together.

    python
    df = pd.DataFrame({
        'normal': np.random.randn(1000),
        'gamma': np.random.gamma(2, size=1000),
        'poisson': np.random.poisson(size=1000)
    })
    
    df.hist(layout=(3,1), figsize=(10,15))
    plt.show()
    

    A DataFrame with three different distributions is created here. Histograms for each column are plotted in a separate plot, but together in one figure, arranged in a 3x1 grid.

Conclusion

The hist() function in Pandas is a robust and straightforward way for creating histograms directly from DataFrame columns, useful in various data exploration contexts. From simple histograms to more complex, customized plots, mastering this tool enhances your ability to quickly assess and communicate the underlying data characteristics. Utilizing the plotting capabilities and customizations as discussed, you can handle a broad range of data types and analysis tasks, thereby achieving deeper insights into your data.