Pivot Table Intro
Understanding Pivot Tables in Pandas
The Python Pandas library is a powerful tool for data analysis. One of its most useful features is the pivot_table function. The pivot_table function allows you to reshape your data in a way that makes it easier to understand, analyze, and visualize. In this blog post, we will delve into what pivot_table is, its most common options, and provide some examples.
What is a Pivot Table?
A pivot table is a data summarization tool that is used in spreadsheet programs and in other data visualization tools. It aggregates a table of data by one or more keys, arranging the data in a rectangle with some of the group keys along the rows and some along the columns. Pivot tables in pandas are served by the pivot_table function and can involve aggregation of multiple columns.
Common Options in Pandas Pivot Table
- values: Column to aggregate, optional.
- index: Column, Grouper, array, or list of the previous. If an array is passed, it must be the same length as the data. The list can contain any of the other types (except list). Keys to group by on the pivot table index. If an array is passed, it's being used as the same manner as column values.
- columns: Column, Grouper, array, or list of the previous. Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values.
- aggfunc: Function to use for aggregation, defaulting to numpy.mean.
- fill_value: Value to replace missing values with (default is None).
- margins: Add all row/column combinations to subtotals, returning a table with a partial total at the end of the data. The default value is False.
Pivot Table Examples
Example 1: Basic Pivot Table
import pandas as pd
import numpy as np
# Example dataframe
df = pd.DataFrame({
"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two", "one", "one", "two", "two"],
"C": ["small", "large", "large", "small", "small", "large", "small", "small", "large"],
"D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
"E": [2, 4, 5, 5, 6, 6, 8, 9, 9]
})
pivot_table = df.pivot_table(values='D', index=['A', 'B'], columns=['C'], aggfunc=np.sum)
print(pivot_table)
Example 2: Pivot Table with fill_value and margins
pivot_table = df.pivot_table(values='D', index=['A', 'B'], columns=['C'], aggfunc=np.sum, fill_value=0, margins=True)
print(pivot_table)
Example 3: Pivot Table with multiple types of aggregation
pivot_table = df.pivot_table(values='D', index=['A', 'B'], columns=['C'], aggfunc={'D': ['mean', 'min', 'max', 'median', 'std']})
print(pivot_table)
Aggregation Functions in Pandas Pivot Table
The Pandas library offers various functions that can be used for the aggregation of data in a pivot table. The 'aggfunc' parameter accepts several different types of inputs:
Standard Aggregation Functions
Below are some of the commonly used aggregation functions:
- 'sum': Compute sum of group values
- 'mean': Compute mean of group values
- 'min': Compute min of group values
- 'max': Compute max of group values
- 'count': Compute count of group values
- 'median': Compute median of group values
- 'prod': Compute product of group values
- 'std': Compute standard deviation of group values
- 'var': Compute variance of group values
Custom Aggregation Functions
The 'aggfunc' can also accept a function defined by the user for custom aggregations. The function should take an array of data and return a single aggregated result.
# Example custom function to calculate range (max - min)
def range_func(x):
return max(x) - min(x)
pivot_table = df.pivot_table(values='D', index=['A', 'B'], columns=['C'], aggfunc=range_func)
Multiple Aggregation Functions
You can also use multiple aggregation functions at once by passing a list to 'aggfunc':
pivot_table = df.pivot_table(values='D', index=['A', 'B'], columns=['C'], aggfunc=['mean', 'min', 'max', 'median', 'std'])
This flexibility in the 'aggfunc' parameter allows you to tailor your pivot table to the specific needs of your data analysis.
Hopefully, these examples give you a solid understanding of how pivot tables work in pandas and how you can start using them in your own data analysis!
Comments
Post a Comment