• Python Snacks
  • Posts
  • Cerberus: The One Package You Need for Data Validation

Cerberus: The One Package You Need for Data Validation

Plus: NumPy 2 release on the horizon, a tutorial on data cleaning, and more

👋 Hey there!

A quick note: I am absolutely loving putting this newsletter together. I’m really hoping that you enjoy seeing Python Snacks in your inbox each week.

If you do, I want to encourage you to rate this newsletter by clicking on the poll at the bottom of the newsletter. This feedback helps me tremendously! 🍿 

Here’s what’s in store for the 3rd newsletter of the year:

  • A package to make data validation straight-forward,

  • What to do if you’re upgrading to NumPy 2,

  • And much more…

Forwarded this email? Sign up here for free.

🥐 The Python Pantry

Curated links to help you become a better programmer

🎒 Tutorials
- Annotating args and kwargs in Python (link)
- Data Cleaning with Python’s Pandas (link)

📦️ Packages
- cssutils: Parse and build CSS in the DOM (link)
- Bowler: Safe code refactoring for modern Python (link)

💰️ Job Postings
- Python Developer (AI/ML)/ Python Developer (AI Chatbot) (link)
- Software Engineer II – Scientific Python Programmer (link)
- Software Engineer Intern (link)

📕 Articles
- NumPy 2 is Coming: Preventing Breakage, Updating your Code (link)
- Django vs Flask - The Difference Between Them (link)

💡 This Week’s Snack

The weekly tip/trick and coding challenge

One of the packages I’ve started learning for work is Cerberus. This package allows us to validate data easily by setting criteria for incoming/outbound data in a configurable-like manner.

For instance, if we wanted to store data into a Mongo database, we may want to validate the incoming document is formatted appropriately.

For instance, take the following document:

document = {
    'first_name' : 'John',
    'last_name' : 'Doe',
    'age' : 60
}

Suppose we impose the following criteria so that we ensure that the data that is written to the database is consistent every time:

  • The first name must be a string and capitalized.

  • The last name must be a string and capitalized.

  • The age must be an integer and must be greater than or equal to 0.

Using Cerberus, we may define these “rules” as such:

schema = {
    'first_name': {
        'type': 'string',
        'coerce': lambda s: s.capitalize()
    },
    'last_name': {
        'type': 'string',
        'coerce': lambda s: s.capitalize()
    },
    'age': {
        'type': 'integer',
        'min': 0
    }
}

Each rule key (first_name, last_name, and age) are mapped to a value containing the validation rules for that key that we found in our document.

In this scenario:

  • type → The data type of the field

  • coerce → Apply a function to the field before validating

  • min → The minimum number (inclusive) this field can be.

From here, we can pass this into a Cerberus validator object and have it return a boolean:

import cerberus

validator = cerberus.Validator(schema)
validated = validator.validate(document)

if validated:
    print("Data is successfully validated!")
else:
    print(validator.errors)

For this week’s challenge, see what other criteria you can add to the rules.

Try seeing if you can add in regex, minlength, maxlength and default. What about coerce for the age field? Hint: it’s not a lambda function 😉

Be sure to leverage the Cerberus documentation (specifically, the schema)!

📋️ How I Can Help

Looking to accelerate your Python skills? Here’s a way you can do so:

  1. Hire me as a Python coach. Get in touch today!

  2. Book a 30 minute 1-on-1 with me to talk Python or AI.

  3. Refer your friend to this newsletter and get a Python code optimization guide after 1 referral. See below for your unique link.

📧 Join the Python Snacks Newsletter! 🐍

Want even more Python-related content that’s useful? Here’s 3 reasons why you should subscribe the Python Snacks newsletter:

  1. Get Ahead in Python with bite-sized Python tips and tricks delivered straight to your inbox, like the one above.

  2. Exclusive Subscriber Perks: Receive a curated selection of up to 6 high-impact Python resources, tips, and exclusive insights with each email.

  3. Get Smarter with Python in under 5 minutes. Your next Python breakthrough could just an email away.

You can unsubscribe at any time.

Interested in starting a newsletter or a blog?

Do you have a wealth of knowledge and insights to share with the world? Starting your own newsletter or blog is an excellent way to establish yourself as an authority in your field, connect with a like-minded community, and open up new opportunities.

If TikTok, Twitter, Facebook, or other social media platforms were to get banned, you’d lose all your followers. This is why you should start a newsletter: you own your audience.

This article may contain affiliate links. Affiliate links come at no cost to you and support the costs of this blog. Should you purchase a product/service from an affiliate link, it will come at no additional cost to you.

Reply

or to participate.