- Python Snacks
- Posts
- The Top 8 Python Pandas Operations You Should Know
The Top 8 Python Pandas Operations You Should Know
Using the go-to framework for data manipulation


Pandas is an extremely popular Python library designed for working with structured data (think rows and columns, like you’d see in a spreadsheet). It provides intuitive tools to load, inspect, clean, transform, and analyze data using just a few lines of code.
You’d use Pandas anytime you’re dealing with:
Tabular data (CSV, Excel, SQL, JSON, etc)
Time series data
Data cleaning and transformation
Statistical summaries or groupings
Preprocessing for machine learning
It’s the go-to tool for data manipulation in Python. Here, I’m showcasing the top 8 Pandas operations you should know when working with the library (#8 is, in my opinion, one of the top things every Pandas users should know).
1. Reading and writing data
This is generally going to be your entry and exit point for nearly every data project. You need to get your data in and export results cleanly. There’s a method to read and write structured data:
pd.read_csv('data.csv') # CSV
pd.read_excel('data.xlsx') # Excel
pd.read_json('data.json') # JSON
pd.read_parquet('data.parquet') # Parquet (fast, columnar)
pd.read_sql(query, conn) # SQL query from DB connection
pd.read_html('url_or_file.html') # Tables from HTML pages
Similarly, there’s write methods:
df.to_csv('output.csv', index=False)
df.to_excel('output.xlsx', index=False)
df.to_json('output.json')
df.to_parquet('output.parquet')
df.to_sql('table_name', conn, if_exists='replace')
» Note: Sometimes, you’ll need to install additional packages like pyarrow for Parquet
2. Data Inspection
To get a quick peek at your data, you’ll want to leverage these methods to understand what you’re working with and catch data issues early:
df.info() # Overview: column names, non-null counts, types
df.head() # First 5 rows (useful for a quick peek)
df.tail() # Last 5 rows (helpful for recent entries)
df.sample(5) # Random 5 rows (great for sanity checks)
df.shape # Tuple of (rows, columns)
df.columns # List of column names
df.dtypes # Data types of each column
df.describe() # Summary statistics for numeric columns
df.nunique() # Number of unique values per column
df.isnull().sum() # Count of missing values per column
3. Filtering rows (Boolean Indexing)
To start working with your data, you’ll need to be able to filter rows by certain data. You can do this through boolean indexing:
# Filter rows where the city is exactly "Chicago"
df[df['city'] == 'Chicago']
# Filter rows where the job title contains "engineer"
df[df['job_title'].str.contains('engineer', case=False)]
# Age over 25 AND lives in New York
df[(df['age'] > 25) & (df['city'] == 'New York')]
# Age under 18 OR over 65
df[(df['age'] < 18) | (df['age'] > 65)]
# Filter for rows where city is in a given list
df[df['city'].isin(['Chicago', 'Seattle'])]
4. Sorting
Another common data manipulation tactic is to sort your values by one or more column values:
df.sort_values(by=['department', 'salary'])
5. GroupBy and Aggregation
In cases where you need to aggregate and summarize large datasets and draw insights from patterns, you’re going to want to group the data together:
# Average salary per department
df.groupby('department')['salary'].mean()
# Total revenue per city
df.groupby('city')['revenue'].sum()
# Multiple aggregations on a single column
df.groupby('category')['price'].agg(['mean', 'max', 'min'])
# Multiple aggregations on multiple columns
df.groupby('store')[
['sales', 'profit']
].agg({'sales': 'sum', 'profit': 'mean'})
6. Merging DataFrames
There are times where you’ll be working with multiple DataFrames. To consolidate or aggregate together, you’ll want to merge them:
df_orders# Basic inner join on a shared column
pd.merge(df1, df2, on='customer_id')
# Left join
pd.merge(df1, df2, on='customer_id', how='left')
# Merging multiple columns
pd.merge(df1, df2, on=['state', 'zip_code'], how='inner')
7. Reshaping (Pivot/Melt)
You’ll sometimes run into reshape your data to do comparisons or prep it for ML models:
# Reshape from long to wide
df.pivot(index='date', columns='region', values='sales')
# Aggregated pivot with duplicates allowed
df.pivot_table(index='region', columns='product', values='sales', aggfunc='sum')
# Reshape from wide to long (ideal for feeding into ML)
pd.melt(df, id_vars='name', value_vars=['math', 'science'], var_name='subject', value_name='score')
8. Apply/Map/Vectorized Operations
One of the most important topics to learn about in Pandas is vectors. This makes operations blazing fast (quicker than for loops), as it uses internal C-optimized code:
df['price_with_tax'] = df['price'] * 1.07
# Pass your own logic in for a specific row/column
df['tax'] = df['income'].apply(lambda x: x * 0.3)
# Transform individual values
df['grade'] = df['score'].map({90: 'A', 80: 'B', 70: 'C'})
Whew, that was a lot. But since you made it this far, I have a question for you:
Would you be interested in a Pandas email-based course?This would be a 2-3 week (preliminary) course delivering hands-on pandas tips every other day, including all of the things you've seen here plus more so that you can become a pro with Pandas. |
📧 Join the Python Snacks Newsletter! 🐍
Want even more Python-related content that’s useful? Here’s 3 reasons why you should subscribe the Python Snacks newsletter:
Get Ahead in Python with bite-sized Python tips and tricks delivered straight to your inbox, like the one above.
Exclusive Subscriber Perks: Receive a curated selection of up to 6 high-impact Python resources, tips, and exclusive insights with each email.
Get Smarter with Python in under 5 minutes. Your next Python breakthrough could just an email away.
You can unsubscribe at any time.
Interested in starting a newsletter or a blog?
Do you have a wealth of knowledge and insights to share with the world? Starting your own newsletter or blog is an excellent way to establish yourself as an authority in your field, connect with a like-minded community, and open up new opportunities.
If TikTok, Twitter, Facebook, or other social media platforms were to get banned, you’d lose all your followers. This is why you should start a newsletter: you own your audience.
This article may contain affiliate links. Affiliate links come at no cost to you and support the costs of this blog. Should you purchase a product/service from an affiliate link, it will come at no additional cost to you.
Reply