Python Numpy split() - Divide Array

Updated on January 1, 2025
split() header image

Introduction

The split() function in the NumPy library is a versatile tool for dividing an array into multiple sub-arrays. Whether working with large datasets or performing parallel computations, this function allows for efficient data manipulation by segmenting arrays based on specified conditions. NumPy split() is particularly useful when handling large volumes of data that need to be processed in manageable parts.

In this article, you will learn how to effectively use the split() function to divide arrays into sub-arrays of equal or defined sizes. Explore how to apply this tool on one-dimensional and multi-dimensional data, and understand how to handle situations where arrays cannot be evenly split.

Basic Usage of split()

Splitting One-Dimensional Arrays

  1. Import the numpy library under the alias np.

  2. Create a one-dimensional array.

  3. Use the split() function to divide the array into equal parts.

    python
    import numpy as np
    
    # Create a one-dimensional array
    data = np.arange(10)  # Generates an array [0, 1, 2, ..., 9]
    
    # Split the array into 5 equal parts
    sub_arrays = np.split(data, 5)
    print(sub_arrays)
    

    This code generates a list of arrays, each containing two consecutive numbers from the original array. Here, np.arange(10) creates an array with integers from 0 to 9, and np.split(data, 5) divides it into five sub-arrays.

Handling Uneven Splits

  1. Try splitting an array into parts that do not evenly distribute the elements.

  2. Examine how NumPy addresses this situation.

    python
    data = np.arange(10)  # One-dimensional array of length 10
    
    # Attempt to split into 3 parts
    try:
        sub_arrays = np.split(data, 3)
        print(sub_arrays)
    except ValueError as e:
        print("Error:", e)
    

    Since the array length is not divisible by three, NumPy raises a ValueError. This example demonstrates the importance of ensuring that the number of divisions evenly divides the array length.

Advanced Usage with higher-dimensional Arrays

Splitting Two-Dimensional Arrays

  1. Create a two-dimensional array.

  2. Use the split() function to divide the array along a specific axis.

    python
    # Create a two-dimensional array
    matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
    
    # Split the array into three parts along rows
    sub_matrices = np.split(matrix, 3, axis=0)
    print(sub_matrices)
    

    This snippet splits the two-dimensional array into three sub-arrays along the first axis (rows). Each sub-array consists of one row from the original array.

Custom Section Splits

  1. Specify the points where the array should be split.

  2. Use the indices or sections parameter to define the exact split points.

    python
    matrix = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
    
    # Split the array at specific indices along columns
    sub_matrices = np.split(matrix, [1, 3], axis=1)
    print(sub_matrices)
    

    The example splits the matrix into three sub-arrays by cutting it just before columns 1 and 3. This results in a separation into columns 0; columns 1 and 2; and column 3.

Conclusion

The split() function from NumPy offers a robust way to divide arrays into smaller sub-arrays, making it easier to manage large datasets or to assign specific sub-datasets to different processes or threads. By mastering array splitting, you enhance your ability to handle, analyze, and manipulate large data structures efficiently in Python. Implement these techniques in your next project to simplify data management tasks and ensure efficient data processing.