dropna

August 17, 2023

Working with Missing Data in Pandas

In this blog post, we will explore various techniques for handling missing data using the Pandas library in Python. Specifically, we will focus on removing rows or columns with NaN or None values using different methods provided by Pandas.

Setup

We begin by importing the required libraries and reading the CSV file containing the movie data.


import pandas as pd
import numpy as np
loc = 'https://raw.githubusercontent.com/aew5044/Python---Public/main/movie.csv'
m = pd.read_csv(loc)

Creating a Subset and Introducing Missing Values

We create a subset of the data, keeping only the columns we want to work with. Additionally, we introduce NaN values in specific rows and columns.


m1 = m[['movie_title','director_name','actor_1_name']][0:5]
m1.loc[0:1,'director_name'] = np.nan
m1.loc[0:2,'actor_1_name'] = np.nan
m1.loc[4,'movie_title'] = np.nan
m1.loc[4,'director_name'] = np.nan
m1.loc[4,'actor_1_name'] = np.nan

Dropping Rows and Columns with Missing Values

Dropping rows with any NA values:
```
m1.dropna()
```
Dropping rows with all NA values:
```
m1.dropna(how='all')
```
Dropping rows with NA values in a subset of columns:
```
m1.dropna(subset=['director_name','actor_1_name'])
```
Dropping columns where all values are NA:
```
m1.dropna(axis='columns',how='all')
```
Dropping columns with more than a certain number of NA values:
```
m1.dropna(axis='columns',thresh=1)
```

Conclusion

In this tutorial, we have explored various techniques for handling missing data in a Pandas DataFrame. We have learned how to drop rows and columns with missing values, depending on the specific criteria. By understanding how to handle missing data, we can improve the quality of our data analysis and ensure more accurate results.

Google Colab Link

Search This Blog

Data Analytics With Python

dropna

Working with Missing Data in Pandas

Setup

Creating a Subset and Introducing Missing Values

Dropping Rows and Columns with Missing Values

Conclusion

Comments

Post a Comment

Popular posts from this blog

Blog Topics

Drawing Tables with ReportLab: A Comprehensive Example

DataFrame groupby agg style bar