dropna
Working with Missing Data in Pandas
In this blog post, we will explore various techniques for handling missing data using the Pandas library in Python. Specifically, we will focus on removing rows or columns with NaN or None values using different methods provided by Pandas.
Setup
We begin by importing the required libraries and reading the CSV file containing the movie data.
import pandas as pd
import numpy as np
loc = 'https://raw.githubusercontent.com/aew5044/Python---Public/main/movie.csv'
m = pd.read_csv(loc)
Creating a Subset and Introducing Missing Values
We create a subset of the data, keeping only the columns we want to work with. Additionally, we introduce NaN values in specific rows and columns.
m1 = m[['movie_title','director_name','actor_1_name']][0:5]
m1.loc[0:1,'director_name'] = np.nan
m1.loc[0:2,'actor_1_name'] = np.nan
m1.loc[4,'movie_title'] = np.nan
m1.loc[4,'director_name'] = np.nan
m1.loc[4,'actor_1_name'] = np.nan
Dropping Rows and Columns with Missing Values
- Dropping rows with any NA values:
m1.dropna()
- Dropping rows with all NA values:
m1.dropna(how='all')
- Dropping rows with NA values in a subset of columns:
m1.dropna(subset=['director_name','actor_1_name'])
- Dropping columns where all values are NA:
m1.dropna(axis='columns',how='all')
- Dropping columns with more than a certain number of NA values:
m1.dropna(axis='columns',thresh=1)
Conclusion
In this tutorial, we have explored various techniques for handling missing data in a Pandas DataFrame. We have learned how to drop rows and columns with missing values, depending on the specific criteria. By understanding how to handle missing data, we can improve the quality of our data analysis and ensure more accurate results.
Comments
Post a Comment