DataFrame groupby agg style bar

 The goal of the article is to investigate the bar function through the style method of Pandas DataFrame.  So, when we work with DataFrames, we can create a visual within a DataFrame.  What does that mean? We can embed bar charts, sparklines, and mini bar charts in the DataFrame.  This can reduce the amount of cognitive load when reviewing a DataFrame.  

Google Colab link with all the code

To get started, we are going to import the data:


import pandas as pd
import numpy as np
import pandas_profiling as pp
loc = 'https://raw.githubusercontent.com/aew5044/Python---Public/main/movie.csv'
m = pd.read_csv(loc)
pd.set_option('display.max_columns',None)
pd.options.display.min_rows = 10

Next, I want to create a new DataFrame that groups by the rating (e.g., "R" "PG-13"), then calculates the total sum, min, max, and total observations for gross. 


content = (
m
    .groupby('content_rating')
    .agg({'gross':['sum','min','max','count']})
)

The resulting DataFrame - content, has a multi-level index for the columns (axis 1).  Multi-level indexes can be challenging to work with, therefore they need flattening. The following code flattens the column indexes by assigning the column names with the calculation contaminated with "_" acting as the seperator. 


content.columns = ['_'.join(x) for x in 
content.columns.to_flat_index()]

The data is ready for manipulation. 

  • At 1, we point to the content. 
  • At 2, we sort the values by the total sum.  
  • At 3, we remove all the "TV" ratings because they don't have reported gross values. 
  • At 4, we divide "gross_sum" by 1 MM.
  • At 5, we convert a DataFrame into a style object. All data transformations need to be placed before the DataFrame is transformed.  
  • At 6, we format the columns for ease of reading.
  • At 7, we create a bar chart across "gross_sum" and color the bar "lightblue."


(
    content                                         #1
    .sort_values(by='gross_sum', ascending=False)   #2
    .query('index not in ("TV-MA","TV-14","TV-PG","TV-Y","TV-Y7","TV-G")')  #3
    .assign(gross_sum = content.gross_sum / 1_000_000)  #4
    .style #5
    .format({'gross_sum':'${:,.0f} MM', 'gross_min':'${:,.0f}', 'gross_max':'${:,.0f}'}) #6
    .bar(subset='gross_sum', color=('lightblue')) #7
)


There we go. We have a stylized DataFrame. The nice thing is, we can change the color and alignment, interact with CSS, and dynamically link to a crosswalk table for dynamic colors.  Very cool! I want to investigate this more!

Comments

Popular posts from this blog

Recode with np.select

Python Topic Roadmap

PD.CUT