Python Snacks
Posts
Cerberus: The One Package You Need for Data Validation

Cerberus: The One Package You Need for Data Validation

Plus: NumPy 2 release on the horizon, a tutorial on data cleaning, and more

Brandon Molyneaux
January 17, 2024

One of the packages I’ve started learning for work is Cerberus. This package allows us to validate data easily by setting criteria for incoming/outbound data in a configurable-like manner.

For instance, if we wanted to store data into a Mongo database, we may want to validate the incoming document is formatted appropriately.

For instance, take the following document:

document = {
    'first_name' : 'John',
    'last_name' : 'Doe',
    'age' : 60
}

Suppose we impose the following criteria so that we ensure that the data that is written to the database is consistent every time:

The first name must be a string and capitalized.
The last name must be a string and capitalized.
The age must be an integer and must be greater than or equal to 0.

Using Cerberus, we may define these “rules” as such:

schema = {
    'first_name': {
        'type': 'string',
        'coerce': lambda s: s.capitalize()
    },
    'last_name': {
        'type': 'string',
        'coerce': lambda s: s.capitalize()
    },
    'age': {
        'type': 'integer',
        'min': 0
    }
}

Each rule key (first_name, last_name, and age) are mapped to a value containing the validation rules for that key that we found in our document.

In this scenario:

type → The data type of the field
coerce → Apply a function to the field before validating
min → The minimum number (inclusive) this field can be.

From here, we can pass this into a Cerberus validator object and have it return a boolean:

import cerberus

validator = cerberus.Validator(schema)
validated = validator.validate(document)

if validated:
    print("Data is successfully validated!")
else:
    print(validator.errors)

For this week’s challenge, see what other criteria you can add to the rules.

Try seeing if you can add in regex, minlength, maxlength and default. What about coerce for the age field? Hint: it’s not a lambda function 😉

Be sure to leverage the Cerberus documentation (specifically, the schema)!

Want even more Python-related content that’s useful? Here’s 3 reasons why you should subscribe the Python Snacks newsletter:

Get Ahead in Python with bite-sized Python tips and tricks delivered straight to your inbox, like the one above.
Exclusive Subscriber Perks: Receive a curated selection of up to 6 high-impact Python resources, tips, and exclusive insights with each email.
Get Smarter with Python in under 5 minutes. Your next Python breakthrough could just an email away.

You can unsubscribe at any time.

Do you have a wealth of knowledge and insights to share with the world? Starting your own newsletter or blog is an excellent way to establish yourself as an authority in your field, connect with a like-minded community, and open up new opportunities.

If TikTok, Twitter, Facebook, or other social media platforms were to get banned, you’d lose all your followers. This is why you should start a newsletter: you own your audience.

» Use this link to get 20% off for 3 months your newsletter! «

This article may contain affiliate links. Affiliate links come at no cost to you and support the costs of this blog. Should you purchase a product/service from an affiliate link, it will come at no additional cost to you.

Reply

or to participate.

Cerberus: The One Package You Need for Data Validation

Plus: NumPy 2 release on the horizon, a tutorial on data cleaning, and more

📧 Join the Python Snacks Newsletter! 🐍

Interested in starting a newsletter or a blog?

Reply