Python Pandas DataFrame hist() - Plot Histogram

Introduction

The hist() function in Python's Pandas library is a versatile tool for creating histograms, which are essential for the visual exploration of data distributions. Histograms help in understanding the underlying frequency distribution (e.g., normal distribution), outliers, skewness, etc., of a dataset. This function makes it straightforward to generate histograms directly from DataFrame columns, facilitating quick data analysis.

In this article, you will learn how to use the hist() function within Pandas to plot histograms. You will explore various customizations and configurations to tailor the histogram to specific analysis needs, handling different types of data, and enhancing the visual appeal of your plots.

Basics of Using hist()

Plotting a Simple Histogram

Import the necessary libraries.
Create a DataFrame.

Call the hist() function on the DataFrame column.

                            python
                            
                        
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Create a DataFrame
df = pd.DataFrame({
    'data': np.random.randn(1000)
})

# Plot histogram
df['data'].hist()
plt.show()

In this example, a DataFrame is created with 1000 normally distributed random numbers. The hist() method plots the histogram of the 'data' column.

Adjusting the Number of Bins

Bins specify how many intervals (bars) the data range is divided into.
Use the bins parameter to modify it.
python
```
df['data'].hist(bins=30)
plt.show()
```
Adjusting the number of bins can help in getting a more refined or a broader view of the data distribution. Here, the data is divided into 30 bins.

Customizing Histograms

Changing Plot Style and Color

The hist() function provides flexibility in styling.
Modify parameters like color and grid.
python
```
df['data'].hist(color='blue', grid=False)
plt.show()
```
This code snippet styles the histogram with a blue color and disables the grid.

Adding Titles and Labels

Enhance readability with titles and axis labels.
python
```
ax = df['data'].hist(bins=20)
ax.set_title('Data Distribution')
ax.set_xlabel('Values')
ax.set_ylabel('Frequency')
plt.show()
```
Titles and labels are crucial for making the histograms self-explanatory. This will add a title, and labels for both the x-axis and y-axis.

Plot Multiple Histograms Together

Histograms for Multiple Columns

Create a DataFrame with multiple numeric columns.
Use hist() to plot histograms for all columns together.
python
```
df = pd.DataFrame({
    'normal': np.random.randn(1000),
    'gamma': np.random.gamma(2, size=1000),
    'poisson': np.random.poisson(size=1000)
})

df.hist(layout=(3,1), figsize=(10,15))
plt.show()
```
A DataFrame with three different distributions is created here. Histograms for each column are plotted in a separate plot, but together in one figure, arranged in a 3x1 grid.

Conclusion

The hist() function in Pandas is a robust and straightforward way for creating histograms directly from DataFrame columns, useful in various data exploration contexts. From simple histograms to more complex, customized plots, mastering this tool enhances your ability to quickly assess and communicate the underlying data characteristics. Utilizing the plotting capabilities and customizations as discussed, you can handle a broad range of data types and analysis tasks, thereby achieving deeper insights into your data.

Comments

No comments yet.

Python Pandas DataFrame hist() - Plot Histogram

Introduction

Basics of Using hist()

Plotting a Simple Histogram

Adjusting the Number of Bins

Customizing Histograms

Changing Plot Style and Color

Adding Titles and Labels

Plot Multiple Histograms Together

Histograms for Multiple Columns

Conclusion

Comments

Products

Features

Solutions

Marketplace

Resources

Company

Tech Talks

Vultr Blogs