Python Pandas DataFrame info() - Display Information

Updated on November 25, 2024
info() header image

Introduction

The info() method in Python's Pandas library is a vital tool for data scientists and analysts working with large datasets. This method provides a concise summary of a DataFrame, including information about index dtype and columns, non-null values, and memory usage. It serves as a quick diagnostic tool to understand the structure and entries of the DataFrame without viewing the entire dataset.

In this article, you will learn how to use the info() method effectively. Discover how to retrieve essential details about your DataFrame, modify its output to suit your needs, and interpret the information it provides.

Understanding the info() Method

Basic Usage of info()

  1. Import the pandas library and create a DataFrame.

  2. Call the info() method to view the summary of the DataFrame.

    python
    import pandas as pd
    data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
            'Age': [28, 22, 34, 42],
            'City': ['New York', 'Paris', 'Berlin', 'London']}
    df = pd.DataFrame(data)
    df.info()
    

    Executing this code results in an output that lists the number of entries, the total number of columns, names of columns, count of non-null entries per column, datatype of each column, and memory usage.

Exploring Parameters of info()

Verbose Parameter

  1. Use the verbose parameter to control the display of information.

  2. Set verbose=False to show a simpler output, especially useful when dealing with a large number of columns.

    python
    df.info(verbose=False)
    

    This adjustment limits the output to the very basics: the DataFrame's range index and the number of columns.

Max_cols Parameter

  1. Control the number of columns summarized with the max_cols parameter.

  2. Change max_cols to fit the number of columns you want detailed in the output.

    python
    df.info(max_cols=2)
    

    This configuration will adjust the output to show detailed information for up to two columns only.

Null_counts Parameter

  1. Specify the null_counts parameter to control the display of non-null counts.

  2. Setting null_counts=True ensures that you see the count of non-null values for each column.

    python
    df.info(null_counts=True)
    

    With this setting, the output will explicitly display non-null counts, which is the default behavior for smaller DataFrames.

Using info() with Large DataFrames

Practical Example with a Large DataFrame

  1. Simulate a larger DataFrame using NumPy to understand the extended use of info().

  2. Observe how info() behaves differently due to large data size.

    python
    import numpy as np
    large_data = pd.DataFrame(np.random.rand(1000, 50), columns=[f'col{i}' for i in range(50)])
    large_data.info()
    

    This demonstration with a DataFrame of 1000 rows and 50 columns highlights the method's ability to summarize extensive data succinctly.

Conclusion

The info() method in Pandas is a powerful and essential tool for quickly assessing the structure and properties of a DataFrame. By understanding and utilizing the parameters of info(), such as verbose, max_cols, and null_counts, you can tailor the output to better suit your analytical needs. Applying this method helps in efficiently diagnosing data, paving the way for more effective data preprocessing and analysis. Make info() a regular part of your data inspection toolkit to maintain clarity and oversight over your data assets.