Object Serialization in Python
Problem
Imagine you are working as a backend developer. After a bunch of logic operations and database queries, you get an object type Person
, which you want to return to frontend. However, you have to return it as a JSON object instead of a Person
object, since it is the standardized protocol in transferring data.
Without external libraries, you will have to do it this way:
class Person:
def __init__(self, name, age):
self.name = name
self.age = ageperson = Person(name='Bill', age=19)
return_value = {
'name': person.name,
'age': person.age
}
You might also come across this problem when working with databases. For example, some relational databases and NoSQL databases only allow inputs of certain basic types, such as text
, integer
, float
, ... and not objects. Therefore, in order to store objects in those databases, you first have to convert them into text
or string
before inserting.
To do this, you would have to loop through the object and get all of its attributes one by one. This is fine for objects with few attributes, but as this number increases, your work gets tiring and repetitive.
In contrast, there are situations when you have to convert a dictionary into a predefined object, like Person
. For instance, when backend receives a request from frontend and wants to convert the request payload back to the original object, the process would look something like this:
request_payload = {
'name': 'Bill',
'age': 19,
'job': 'student'
}
bill = Person(
name = request_payload['name'],
age = request_payload['age']
)
You would have to loop through the request payload and pick the attributes you need for your object.
These processes are called Serialization and Deserialization, which you often encounter when dealing with the storage and transfer of data.
1. Serialization and Deserialization
1.1. Serialization
Serialization is the process of converting app-level objects to primitive types, such as dictionary, text, string,… The serialized objects can then be rendered to standard formats such as JSON for use in an HTTP API. This technique is often used in database management where we need to convert custom objects into dictionaries and then to texts before storing into the database.
1.2. Deserialization
Deserialization is the reverse process of serialization, in which data, typically dictionaries or JSON objects, are converted back into app-level objects. This is often used in web development when backend receives input data from frontend or get data from the database and convert them back into custom objects.
There are several libraries in Python that support serialization and deserialization, including pickle
, json
, cattrs
, and most notably, marshmallow
, which we are going through in the second part of this article.
2. Marshmallow
2.1. What is Marshmallow?
marshmallow
is an ORM/ODM/framework-agnostic library for converting complex data types, such as objects, to and from native Python data types. It can be used for three main functions:
- Validate input data
- Deserialize input data to app-level objects
- Serialize app-level objects to primitive Python types
In reality, one of the most prominent use cases of marshmallow
is to deserialize JSON objects to Python objects or serialize Python objects to JSON objects to be used in web APIs, especially in the communication between backend and frontend.
To accomplish all of the usabilities above, marshmallow
introduces the definition of a schema
which can be used to apply rules to validate the data being deserialized or change the way data are being serialized. A schema defines the rules that guides deserialization, called load
, and serialization, called dump
. It allows us to define the fields that will be loaded or dumped, add conditions on the fields, and inject computation to perform transformation between load
and dump
.
2.2. How to use Marshmallow?
Install
We can install marshmallow
with pip
:
pip install marshmallow
Define a Schema
In order to do serialization and deserialization, we need to define a Schema
which set the rules for those operations.
from marshmallow import Schema, fieldsclass PersonSchema(Schema):
name = fields.Str()
age = fields.Int()
For example, we want to define a Schema
for class Person
defined at the beginning of this article, which includes two fields name
and age
. To do that, we import Schema
from marshmallow
then inherit it in our custom schema. Then we list the attributes under our schema, including name
and age
since we want to use it for objects of class Person
, along with the type for each attribute. In this case, name
is a string and age
is an integer.
Validate inputs
The first and most basic thing you can do with a Schema is validating input data. To do this, we use marshmallow
's load
function.
data = {
'name': 'bill',
'age': 'nineteen'
}
person = PersonSchema().load(data)
This will throw a validation error, as we are passing age
as a string ('nineteen') instead of an integer as we defined earlier in our schema.
ValidationError: {'age': ['Not a valid integer.']}
By correcting age
to 19
, the object will be successfully deserialized to {'name': 'bill', 'age': 19}
.
Serialize objects
To serialize app-level objects in marshmallow
, we use dump
.
person = Person(name='bill', age=19)
serialized_value = PersonSchema().dump(person)
# {
# 'name': 'bill',
# 'age': 19,
# }
After serializing with dump
, we get a dictionary with keys and values corresponding to the original object, which could then be converted to a text easily and stored in the database.
Pass arguments into Schema fields
When creating schemas, we can pass optional arguments to each fields:
many
(boolean): whether the resulting schema is an array of the instantiated schemaload_only
(boolean): to be considered only during loaddump_only
(boolean):to be considered only during dumprequired
(boolean): specify whether the field is required in deserializationdata_key
(string): specify the alternative field key in input dataallow_none
(boolean): whether None is allowed for the field's valuevalidate
(validator): used as function for value validationdefault
: value used in serialization (dump) when the value is missingmissing
: value used in deserialization (load) when value is missingerror_messages
(dictionary): error messages to override the default messages on errors
e.g:
from marshmallow import Schema, fields, validateclass EmployeeSchema(Schema):
name = fields.Str(
required=True,
error_messages={
"required": "Name is missing.",
"type": "Name must be a string."
}
)
age = fields.Int(required=True, validate=validate.Range(min=18))
skills = fields.Str(many=True, allow_none=True)
home_address = fields.Str(data_key='address', default='Hanoi')
In the snippet above, we have a schema for class Employee
, where:
name
is a required string and has customized error messages in cases when it is missing or has invalid data type.age
is required and must be at least 18 (no child labor allowed)skills
is an array of strings and can beNone
, as an employee can have many skills or no skills at all (though in the later case, it is questionable why he still remains at the company)home_address
is a string, of which input can be received underaddress
key, and has the default of 'Hanoi'
Nest schemas
In marshmallow
, we can nest a schema inside another so that the new schema inherits attributes of the one being nested.
e.g:
from marshmallow import Schema, fields
class PersonSchema(Schema):
name = fields.Str()
age = fields.Int()
class HouseSchema(Schema):
address = fields.Str()
class FamilySchema(HouseSchema):
people = fields.Nested(PersonSchema, many=True)
As you can see:
FamilySchema
inheritsHouseSchema
, which means that beside fieldpeople
, it will also include fieldaddress
added byHouseSchema
.- Field
people
inFamilySchema
nests the schemaPersonSchema
, signaling that this field contains an array ofPerson
objects. You know,people
= manyperson
(s), obviously.
With nesting schemas, handling complex and nested data structures is no longer a headache.
Perform transformation before and after dump
or load
marshmallow
allows us to perform transformation before or after serialization and deserialization by using a number of hooks. These hooks register a method to invoke before or after deserializing or serializing an object.
@pre_load
: before deserializing@post_load
: after deserializing@pre_dump
: before serializing@post_dump
: after serializing
e.g:
from marshmallow import Schema, fields, post_loadclass PersonSchema(Schema):
name = fields.Str()
age = fields.Int() @post_load
def make_person(self, data, **__):
return Person(**data)
In the example above, we made some changes to our previously-defined PersonSchema
by adding a post_load
method to it. By doing this, when deserializing, we get back directly to an instance of class Person
.
2.3. Why should we use Marshmallow?
Agnostic
marshmallow
makes no assumption about web frameworks or database layers. It will work with just about any ORM, ODM, or no ORM at all.
Concise, familiar, and reusable syntax
marshmallow
uses classes. This allows for easy code reuse and configuration. It also allows for powerful means for configuring and extending schemas, such as adding post-processing and error handling behavior.
High configurability
It’s easy and convenient to customize and configure marshmallow
schemas. Customized configuration can be achieved either by passing arguments into fields, using class Meta
paradigm, nesting and extending schemas, etc.
Conclusion
This article covers a really brief introduction to serialization and deserialization in Python, as well as a quick tutorial on marshmallow
, one of the current best tools for these jobs. This tutorial is just a glimpse of what marshmallow
is capable of, so please refer to the official documentation for more advanced usage. Happy coding!
Let’s Get In Touch:
- Personal website: https://billtrn.com
- LinkedIn: https://www.linkedin.com/in/billtrn/
- Email: trantriducs@gmail.com
- GitHub: https://github.com/billtrn