
Imagine you’re managing a department budget and you want to see how much money is earned per hour each week, grouped and compared across other important features like gender or education.
Most people think you need multiple Pandas functions like groupby; this is a viable solution, but you’ll need several other operations to get it formatted in the way you want it to.
But, there’s a better way: leveraging pivot_table
A pivot_table lets you take flat row-based data and reshape it into a summarized, structured table.
For example, looking at this data frame:
>>> data = {
'age': [39, 50, 38, 53, 28,31],
'education': ['Bachelors', 'Bachelors', 'HS-grad', 'Doctorate', 'Bachelors','Masters'],
'gender': ['Male', 'Male', 'Male', 'Male', 'Female','Female'],
'hours-per-week': [40, 13, 40, 40, 40, 30]
}
# Create the DataFrame
>>> df = pd.DataFrame(data)
age education gender hours-per-week
0 39 Bachelors Male 40
1 50 Bachelors Male 13
2 38 HS-grad Male 40
3 53 Doctorate Male 40
4 28 Bachelors Female 40
5 31 Masters Female 30Let's suppose that we want to calculate the average hours worked per week based off of gender and age.
Without pivot_table, we have to perform the following steps:
>>> df["age"] = pd.cut(df["age"],[0,17,29,64,79])
>>> df.groupby(["gender","age"])["hours-per-week"].mean().unstack("age")
age (0, 17] (17, 29] (29, 64] (64, 79]
gender
Female NaN 40.0 30.00 NaN
Male NaN NaN 33.25 NaN
>>> df.groupby(["gender","age"])["hours-per-week"].mean().unstack("gender")
gender Female Male
age
(0, 17] NaN NaN
(17, 29] 40.0 NaN
(29, 64] 30.0 33.25
(64, 79] NaN NaNBreaking this down operation-by-operation:
cut: This Pandas function divides the ages into different bins or age brackets to help us summarize the data. These bins are 0-17, 17-29, 29-64, and 64-79 years old.
groupby: This groups the data by age and gender bins based upon the average (mean) hours per week worked.
unstack: It helps us with visualization by converting a dimension into a column. If we leave off unstack, we’ll see the following after calling groupby:
>>> df.groupby(["gender","age"])["hours-per-week"].mean()
Female (0, 17] NaN
(17, 29] 40.00
(29, 64] 30.00
(64, 79] NaN
Male (0, 17] NaN
(17, 29] NaN
(29, 64] 33.25
(64, 79] NaN
Name: hours-per-week, dtype: float64But by using unstack, it’s much easier to read and interpret based off of gender:
>>> df.groupby(["gender","age"])["hours-per-week"].mean().unstack("gender")
gender Female Male
age
(0, 17] NaN NaN
(17, 29] 40.0 NaN
(29, 64] 30.0 33.25
(64, 79] NaN NaNLooking at this syntax, it may not be so obvious and clear as to what we're doing and what the end result would look like. We can consolidate all of this code using pivot_table instead:
# Creating pivot table with 'gender' as index
>>> df.pivot_table(
index='gender',
columns='age',
values='hours-per-week',
aggfunc='mean'
)
age (17, 29] (29, 64]
gender
Female 40.0 30.00
Male NaN 33.25
# Creating a pivot table with 'age' as index
>>> gender = df.pivot_table(
index='age',
columns='gender',
values='hours-per-week',
aggfunc='mean'
)
gender Female Male
age
(17, 29] 40.0 NaN
(29, 64] 30.0 33.25By consolidating the grouping, restructuring, and aggregation into a single, clean function call, your code instantly becomes more readable without having to use methods like groupby and unstack.
Breaking down the parameters used here:
index: The columns from the original DataFrame you want to show as the primary labels down the rows of the new pivot table.
columns: The columns’ unique values you want to show as the column headers in your new pivot table
values: Numeric data you want to perform the aggregations on, hours-per-week in this case.
aggfunc: The aggregation function you want to apply on the numeric data, which is the mean function for this example.
For a full breakdown of all of the parameters for pivot_table, visit the pandas documentation.
Protip: Use fill_value=0 as a pivot_table parameter to replace NaN with 0 for cleaner reports and better calculations.
Happy coding!
📧 Join the Python Snacks Newsletter! 🐍
Want even more Python-related content that’s useful? Here’s 3 reasons why you should subscribe the Python Snacks newsletter:
Get Ahead in Python with bite-sized Python tips and tricks delivered straight to your inbox, like the one above.
Exclusive Subscriber Perks: Receive a curated selection of up to 6 high-impact Python resources, tips, and exclusive insights with each email.
Get Smarter with Python in under 5 minutes. Your next Python breakthrough could just an email away.
You can unsubscribe at any time.
Interested in starting a newsletter or a blog?
Do you have a wealth of knowledge and insights to share with the world? Starting your own newsletter or blog is an excellent way to establish yourself as an authority in your field, connect with a like-minded community, and open up new opportunities.
If TikTok, Twitter, Facebook, or other social media platforms were to get banned, you’d lose all your followers. This is why you should start a newsletter: you own your audience.
This article may contain affiliate links. Affiliate links come at no cost to you and support the costs of this blog. Should you purchase a product/service from an affiliate link, it will come at no additional cost to you.


