Selecting Columns in Pandas

Column Selection Techniques in Pandas

Column Selection Techniques in Pandas

This blog post aims to explore different methods for selecting columns in Pandas DataFrames, inspired by Matt Harrison's book, Effective Pandas. The Python library Pandas provides multiple flexible and efficient ways to manipulate, analyze, and visualize data. One of the most common tasks in data wrangling is column selection. Let's examine some effective techniques for this.

Initial Setup


import pandas as pd
loc = 'https://raw.githubusercontent.com/aew5044/Python---Public/main/movie.csv'
m = pd.read_csv(loc)
m.head()
    

Basic Column Selection

You can start by selecting columns directly by their names.


ma = m[['actor_1_name', 'actor_2_name', 'actor_3_name', 'director_name', 'movie_title','gross']]
ma.head()
    

Using Regex for Column Selection

Regular expressions (Regex) can be powerful tools for pattern matching. The filter method can accept regex patterns.


m.filter(regex='^actor|director')
    

Rename Columns

To make your DataFrame more manageable, you can rename column names. Here, a function named shorten is used to rename columns.


def shorten(col):
    return (
        col.replace('facebook_likes', 'fb')
        .replace('_for_reviews', '')
        .replace('_name', '')
    )
ma = m.rename(columns=shorten)
ma.head()
    

Selection by Data Types

You can also select columns based on their data types using the select_dtypes method.


m.select_dtypes(include='number').head()
m.select_dtypes(include=['object','int']).head()
m.select_dtypes(exclude='float').head()
    

Substring Matching

The filter method can be used to match a substring within the column names.


m.filter(like='num').head()
    

Combining Multiple Filters

You can combine multiple criteria to filter out columns using Regex and the filter method.


m.filter(regex='^actor|director|movie_title|gross').head()
    

Conclusion

Effective column selection in Pandas is crucial for data manipulation and analysis. The methods outlined in this post offer a range of options for different scenarios. Understanding how to combine these tools can make your data wrangling processes more efficient and robust.

Link to Google Colab

Enjoy the Notebook - link

Comments

Popular posts from this blog

Blog Topics

Drawing Tables with ReportLab: A Comprehensive Example

DataFrame groupby agg style bar