Python Numpy histogram() - Generate Histogram

Updated on November 18, 2024
histogram() header image

Introduction

NumPy, a core library for performing scientific and numerical computations in Python, includes a function called histogram() that allows you to efficiently generate histograms from data sets. This function calculates the frequency of sample observations in a certain set of bins, which is vital for data analysis and understanding the distribution of data points.

In this article, you will learn how to use the NumPy histogram() function to create histograms from different types of data sets. Explore various parameters that modify the behavior of this function, including bin sizes and ranges, and learn how to interpret the results.

Using the histogram() Function

Basic Histogram Generation

  1. Import the NumPy package.

  2. Prepare a dataset, either synthetic or real-world data.

  3. Call the histogram() function with this dataset to compute the histogram.

    python
    import numpy as np
    
    data = np.random.randn(1000)  # Generate 1000 random numbers
    histogram, bin_edges = np.histogram(data, bins=10)  # Default 10 bins
    print("Histogram:", histogram)
    print("Bin edges:", bin_edges)
    

    This code generates 1000 random data points, computes their histogram into 10 bins, and returns the counts per bin as well as the edges of each bin.

Adjusting Bin Sizes and Ranges

  1. Decide on the number of bins or manually specify the bin edges.

  2. Use the bins parameter to adjust the size and coverage of the histogram bins.

  3. Optionally, specify the range parameter to focus the histogram on a specific interval.

    python
    bins = 20  # More bins for finer granularity
    range = (-3, 3)  # Limit the data within 3 standard deviations
    
    histogram, bin_edges = np.histogram(data, bins=bins, range=range)
    print("Histogram with 20 bins:", histogram)
    print("Bin edges within range:", bin_edges)
    

    By specifying 20 bins and limiting the range, the histogram more finely resolves data within the specified range of -3 to 3.

Histogram Visualization using Matplotlib

  1. Import the matplotlib.pyplot module for plotting.

  2. Plot the histogram with the data from the np.histogram() method.

  3. Customize the plot with titles and labels.

    python
    import matplotlib.pyplot as plt
    
    plt.hist(data, bins=bin_edges, alpha=0.75, color='blue', edgecolor='black')
    plt.title('Data Distribution')
    plt.xlabel('Values')
    plt.ylabel('Frequency')
    plt.show()
    

    This snippet uses Matplotlib to plot the histogram of the data. The bin_edges from the NumPy histogram() function define the bins.

Conclusion

The histogram() function in NumPy is highly effective for analyzing the distribution of numerical data sets in Python. By adjusting parameters like the number of bins and the range, you can customize the detail and focus of the histogram, providing valuable insights into the structure of your data. Combining this with visualization tools such as Matplotlib allows for effective communication of data characteristics. Employ these strategies to explore and present data distributions clearly and efficiently.