dropna

Working with Missing Data in Pandas

Working with Missing Data in Pandas


In this blog post, we will explore various techniques for handling missing data using the Pandas library in Python. Specifically, we will focus on removing rows or columns with NaN or None values using different methods provided by Pandas.

Setup

We begin by importing the required libraries and reading the CSV file containing the movie data.


import pandas as pd
import numpy as np
loc = 'https://raw.githubusercontent.com/aew5044/Python---Public/main/movie.csv'
m = pd.read_csv(loc)
    

Creating a Subset and Introducing Missing Values

We create a subset of the data, keeping only the columns we want to work with. Additionally, we introduce NaN values in specific rows and columns.


m1 = m[['movie_title','director_name','actor_1_name']][0:5]
m1.loc[0:1,'director_name'] = np.nan
m1.loc[0:2,'actor_1_name'] = np.nan
m1.loc[4,'movie_title'] = np.nan
m1.loc[4,'director_name'] = np.nan
m1.loc[4,'actor_1_name'] = np.nan
    

Dropping Rows and Columns with Missing Values

  • Dropping rows with any NA values:
    m1.dropna()
  • Dropping rows with all NA values:
    m1.dropna(how='all')
  • Dropping rows with NA values in a subset of columns:
    m1.dropna(subset=['director_name','actor_1_name'])
  • Dropping columns where all values are NA:
    m1.dropna(axis='columns',how='all')
  • Dropping columns with more than a certain number of NA values:
    m1.dropna(axis='columns',thresh=1)

Conclusion

In this tutorial, we have explored various techniques for handling missing data in a Pandas DataFrame. We have learned how to drop rows and columns with missing values, depending on the specific criteria. By understanding how to handle missing data, we can improve the quality of our data analysis and ensure more accurate results.

Google Colab Link

Comments

Popular posts from this blog

Blog Topics

Drawing Tables with ReportLab: A Comprehensive Example

DataFrame groupby agg style bar