Selecting Columns in Pandas
Column Selection Techniques in Pandas
This blog post aims to explore different methods for selecting columns in Pandas DataFrames, inspired by Matt Harrison's book, Effective Pandas. The Python library Pandas provides multiple flexible and efficient ways to manipulate, analyze, and visualize data. One of the most common tasks in data wrangling is column selection. Let's examine some effective techniques for this.
Initial Setup
import pandas as pd
loc = 'https://raw.githubusercontent.com/aew5044/Python---Public/main/movie.csv'
m = pd.read_csv(loc)
m.head()
Basic Column Selection
You can start by selecting columns directly by their names.
ma = m[['actor_1_name', 'actor_2_name', 'actor_3_name', 'director_name', 'movie_title','gross']]
ma.head()
Using Regex for Column Selection
Regular expressions (Regex) can be powerful tools for pattern matching. The filter
method can accept regex patterns.
m.filter(regex='^actor|director')
Rename Columns
To make your DataFrame more manageable, you can rename column names. Here, a function named shorten
is used to rename columns.
def shorten(col):
return (
col.replace('facebook_likes', 'fb')
.replace('_for_reviews', '')
.replace('_name', '')
)
ma = m.rename(columns=shorten)
ma.head()
Selection by Data Types
You can also select columns based on their data types using the select_dtypes
method.
m.select_dtypes(include='number').head()
m.select_dtypes(include=['object','int']).head()
m.select_dtypes(exclude='float').head()
Substring Matching
The filter
method can be used to match a substring within the column names.
m.filter(like='num').head()
Combining Multiple Filters
You can combine multiple criteria to filter out columns using Regex and the filter
method.
m.filter(regex='^actor|director|movie_title|gross').head()
Conclusion
Effective column selection in Pandas is crucial for data manipulation and analysis. The methods outlined in this post offer a range of options for different scenarios. Understanding how to combine these tools can make your data wrangling processes more efficient and robust.
Link to Google Colab
Enjoy the Notebook - link
Comments
Post a Comment