Python Numpy mean() - Calculate Average Value

Updated on November 15, 2024
mean() header image

Introduction

The mean() function in the NumPy library is pivotal for calculating the average value from an array of numbers. This function simplifies statistical data analysis, easing the process of finding central tendencies in large datasets. Its usage spans various fields, including finance, science, and machine learning where quick and accurate average calculations are crucial.

In this article, you will learn how to harness the mean() function to compute averages effectively. The guidance provided will cover applying this function to different data structures and will explore variations in its application to enhance your data manipulation skills in Python.

Calculating Mean in Basic Arrays

Calculate the Mean of a Single-Dimensional Array

  1. Import the NumPy library.

  2. Create a basic single-dimensional array.

  3. Calculate and print the mean of the array using np.mean().

    python
    import numpy as np
    
    data = np.array([1, 2, 3, 4, 5])
    average = np.mean(data)
    print(average)
    

    This script calculates the average of the numbers 1 through 5, resulting in 3.0.

Calculate Mean for Multi-Dimensional Arrays

  1. Initiate a multi-dimensional array using NumPy.

  2. Apply the mean() function with appropriate axis argument to compute means along different dimensions.

    python
    import numpy as np
    
    matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
    mean_all = np.mean(matrix)
    mean_row = np.mean(matrix, axis=0)  # Mean across rows
    mean_column = np.mean(matrix, axis=1)  # Mean across columns
    
    print("Mean of entire matrix: ", mean_all)
    print("Mean across rows: ", mean_row)
    print("Mean across columns: ", mean_column)
    

    Here, mean_all computes the overall mean, mean_row computes the mean of each column, and mean_column computes the mean of each row.

Working with Different Data Types

Calculate Average in Arrays with Floating-Point Numbers

  1. Understand that the mean() function automatically handles floating-point numbers.

  2. Construct an array with floating-point numbers and compute the mean.

    python
    import numpy as np
    
    float_data = np.array([1.1, 2.2, 3.3, 4.4, 5.5])
    float_average = np.mean(float_data)
    print(float_average)
    

    The output here is a floating-point number representing the average, which reflects the greater precision in the data.

Managing NaN Values in Data

  1. Recognize that NaN (Not a Number) values can affect the average calculation.

  2. Use np.nanmean() to correctly compute the mean by ignoring NaN values.

    python
    import numpy as np
    
    data_with_nan = np.array([1, 2, np.nan, 4, 5])
    average_without_nan = np.nanmean(data_with_nan)
    print(average_without_nan)
    

    np.nanmean() provides the mean of the array while ignoring np.nan values. This function is essential for accurate calculations in datasets where some data points are missing or undefined.

Conclusion

The mean() function from NumPy provides a robust method for calculating averages across various data structures and types. By understanding how to effectively use this function and its variants like np.nanmean(), you enhance your ability to handle and analyze numerical data in Python. This tutorial leads you through the essential aspects, ensuring that you can implement these techniques in both simple and complex data scenarios for insightful data analysis.