- Python Snacks
- Posts
- Python's Threading Basics
Python's Threading Basics
What is threading? How can we use threading for our applications?
Most code runs sequentially, meaning one line of code is executed after another. This approach works well for many applications (and for most tasks), but sometimes you need your code to perform multiple tasks at once.
I’m going to show you how you can leverage the threading built-in package to be able to run code simultaneously.
» Note: I recommend reading this on a desktop. I tried to optimize this for mobile viewing, but it’s formatted better for desktop viewing.
What is Threading?
Threading is simply a way to run multiple operations at the same time.
This can be beneficial in several different scenarios that require tasks to be executed independently of each other, such as handling user input, performing background calculations, or downloading files.
Think of threading like an road intersection. Suppose we had cars on both sides of the road stay straight. It wouldn’t make a lot of sense to have the cars taking a right-hand turn stop and wait for the cars going straight to stop. | ![]() Example of cars taking right turns while others are going straight. |
We theoretically could make the right hand turners wait, but there would be a lot of upset people. So to solve this, we have the cars take a right hand turn at the same time, or concurrently while the other cars remain straight.
Threading can be thought of the same way - we may not want a process to wait (the cars taking a right hand turn), but instead execute as a different process is running (the cars going straight through the intersection).
Threading: A Practical Example
Let’s say that we’re building a simple data processing pipeline, where we’re:
Reading from a source (read_from_source())
Filter the data so only 3 days worth will be stored
Storing the results in a database (df.to_sql(table, conn)) using pandas
Right now, this would consist of us reading from a source, then writing to the database, then moving to the next source and doing it again and so forth. Our code may look something like this: | ![]() Threading: sequential reading/writing from multiple sources (with sample code) |
import pandas as pd
from datetime import datetime, UTC, timedelta
# Create the db connection
db_connection = sqlite3.connect('database.db')
# List out the sources
sources = ['source1.csv', 'source2.csv', 'source3.csv']
def read_and_load(source, db_name, db_connection):
# Load in the data from the source (code not here)
data = read_from_source(source)
# Load the data into a pandas dataframe
df = pd.DataFrame(data)
# Filter the time to keep 3 days worth
start_time = datetime.now(UTC)
end_time = start_time - timedelta(days = 3)
# Filter between the start_time and end_time
df = df[
(df['time'] >= start_time)
&
(df['time'] <= end_time)
]
# Load it into the database with pandas, (variables not defined here for simplicity)
df.to_sql(db_name, db_connection, mode = 'a')
# List out the sources
sources = ['source1.csv', 'source2.csv', 'source3.csv']
# Throw this in a loop and keep going until
# an error is thrown
while True:
try:
for source in sources:
# read from the given source, write to the db.
read_and_load(
source,
'database.db',
db_connection
)
except:
break
# Close the connection to the database
db_connection.close()
Here’s the problem: In an application that has multiple sources and possibly multiple destinations, we may find ourselves in a situation where we’re “lagging” behind the most recent data over time.
That is, source1 is read from, then source2, then source3. Because the other sources are waiting, new data is accumulating in those files. What if we want to run this process for all 3 files at once? We’d create a thread that runs the read_and_load function for each source we want to read from and then start it. If we replace the while True loop in the above code, it may look something like this: | ![]() Threading: concurrent processes reading and writing to the database (with source code) |
import threading as th
import sqlite3
# Instead of the `while True` loop, replace it with this code.
for source in sources:
# Create the thread object
thread = th.Thread(
target = read_and_load,
args = (source, 'database.db', db_connection)
)
# add it to our list
threads.append(thread)
thread.start()
is_running = True
while is_running:
# code that runs while the threads are running.
# example of something to cause it to stop: an error
# or a keyboard interrupt (ctrl+c)
if error:
is_running = False
# Stop the threads.
for thread in threads:
thread.join()
# Close the connection to the database
db_connection.close()
📧Join the Python Snacks Newsletter!🐍
Want even more Python-related content that’s useful? Here’s 3 reasons why you should subscribe the Python Snacks newsletter:
Get Ahead in Python with bite-sized Python tips and tricks delivered straight to your inbox, like the one above.
Exclusive Subscriber Perks: Receive a curated selection of up to 6 high-impact Python resources, tips, and exclusive insights with each email.
Get Smarter with Python in under 5 minutes. Your next Python breakthrough could just an email away.
You can unsubscribe at any time.
Interested in starting a newsletter or a blog?
Do you have a wealth of knowledge and insights to share with the world? Starting your own newsletter or blog is an excellent way to establish yourself as an authority in your field, connect with a like-minded community, and open up new opportunities.
If TikTok, Twitter, Facebook, or other social media platforms were to get banned, you’d lose all your followers. This is why you should start a newsletter: you own your audience.
This article may contain affiliate links. Affiliate links come at no cost to you and support the costs of this blog. Should you purchase a product/service from an affiliate link, it will come at no additional cost to you.
Reply