Object Serialization in Python

Convert custom objects into JSON and vice versa

Bill Tran
7 min readDec 10, 2020
Photo by @joyfulcaptures on unsplash.com

Problem

Imagine you are working as a backend developer. After a bunch of logic operations and database queries, you get an object type Person, which you want to return to frontend. However, you have to return it as a JSON object instead of a Person object, since it is the standardized protocol in transferring data.

Without external libraries, you will have to do it this way:

class Person:
def __init__(self, name, age):
self.name = name
self.age = age
person = Person(name='Bill', age=19)
return_value = {
'name': person.name,
'age': person.age
}

You might also come across this problem when working with databases. For example, some relational databases and NoSQL databases only allow inputs of certain basic types, such as text, integer, float, ... and not objects. Therefore, in order to store objects in those databases, you first have to convert them into text or string before inserting.

To do this, you would have to loop through the object and get all of its attributes one by one. This is fine for objects with few attributes, but as this number increases, your work gets tiring and repetitive.

In contrast, there are situations when you have to convert a dictionary into a predefined object, like Person. For instance, when backend receives a request from frontend and wants to convert the request payload back to the original object, the process would look something like this:

request_payload = {
'name': 'Bill',
'age': 19,
'job': 'student'
}
bill = Person(
name = request_payload['name'],
age = request_payload['age']
)

You would have to loop through the request payload and pick the attributes you need for your object.

These processes are called Serialization and Deserialization, which you often encounter when dealing with the storage and transfer of data.

1. Serialization and Deserialization

1.1. Serialization

Serialization is the process of converting app-level objects to primitive types, such as dictionary, text, string,… The serialized objects can then be rendered to standard formats such as JSON for use in an HTTP API. This technique is often used in database management where we need to convert custom objects into dictionaries and then to texts before storing into the database.

1.2. Deserialization

Deserialization is the reverse process of serialization, in which data, typically dictionaries or JSON objects, are converted back into app-level objects. This is often used in web development when backend receives input data from frontend or get data from the database and convert them back into custom objects.

There are several libraries in Python that support serialization and deserialization, including pickle, json, cattrs, and most notably, marshmallow, which we are going through in the second part of this article.

2. Marshmallow

2.1. What is Marshmallow?

marshmallow is an ORM/ODM/framework-agnostic library for converting complex data types, such as objects, to and from native Python data types. It can be used for three main functions:

  • Validate input data
  • Deserialize input data to app-level objects
  • Serialize app-level objects to primitive Python types

In reality, one of the most prominent use cases of marshmallow is to deserialize JSON objects to Python objects or serialize Python objects to JSON objects to be used in web APIs, especially in the communication between backend and frontend.

To accomplish all of the usabilities above, marshmallow introduces the definition of a schema which can be used to apply rules to validate the data being deserialized or change the way data are being serialized. A schema defines the rules that guides deserialization, called load, and serialization, called dump. It allows us to define the fields that will be loaded or dumped, add conditions on the fields, and inject computation to perform transformation between load and dump.

2.2. How to use Marshmallow?

Install

We can install marshmallow with pip:

pip install marshmallow

Define a Schema

In order to do serialization and deserialization, we need to define a Schema which set the rules for those operations.

from marshmallow import Schema, fieldsclass PersonSchema(Schema):
name = fields.Str()
age = fields.Int()

For example, we want to define a Schema for class Person defined at the beginning of this article, which includes two fields name and age. To do that, we import Schema from marshmallow then inherit it in our custom schema. Then we list the attributes under our schema, including name and age since we want to use it for objects of class Person, along with the type for each attribute. In this case, name is a string and age is an integer.

Validate inputs

The first and most basic thing you can do with a Schema is validating input data. To do this, we use marshmallow's load function.

data = {
'name': 'bill',
'age': 'nineteen'
}
person = PersonSchema().load(data)

This will throw a validation error, as we are passing age as a string ('nineteen') instead of an integer as we defined earlier in our schema.

ValidationError: {'age': ['Not a valid integer.']}

By correcting age to 19, the object will be successfully deserialized to {'name': 'bill', 'age': 19}.

Serialize objects

To serialize app-level objects in marshmallow, we use dump.

person = Person(name='bill', age=19)
serialized_value = PersonSchema().dump(person)
# {
# 'name': 'bill',
# 'age': 19,
# }

After serializing with dump, we get a dictionary with keys and values corresponding to the original object, which could then be converted to a text easily and stored in the database.

Pass arguments into Schema fields

When creating schemas, we can pass optional arguments to each fields:

  • many (boolean): whether the resulting schema is an array of the instantiated schema
  • load_only (boolean): to be considered only during load
  • dump_only (boolean):to be considered only during dump
  • required (boolean): specify whether the field is required in deserialization
  • data_key (string): specify the alternative field key in input data
  • allow_none (boolean): whether None is allowed for the field's value
  • validate (validator): used as function for value validation
  • default: value used in serialization (dump) when the value is missing
  • missing: value used in deserialization (load) when value is missing
  • error_messages (dictionary): error messages to override the default messages on errors

e.g:

from marshmallow import Schema, fields, validateclass EmployeeSchema(Schema):
name = fields.Str(
required=True,
error_messages={
"required": "Name is missing.",
"type": "Name must be a string."
}
)
age = fields.Int(required=True, validate=validate.Range(min=18))
skills = fields.Str(many=True, allow_none=True)
home_address = fields.Str(data_key='address', default='Hanoi')

In the snippet above, we have a schema for class Employee, where:

  • name is a required string and has customized error messages in cases when it is missing or has invalid data type.
  • age is required and must be at least 18 (no child labor allowed)
  • skills is an array of strings and can be None, as an employee can have many skills or no skills at all (though in the later case, it is questionable why he still remains at the company)
  • home_address is a string, of which input can be received under address key, and has the default of 'Hanoi'

Nest schemas

In marshmallow, we can nest a schema inside another so that the new schema inherits attributes of the one being nested.

e.g:

from marshmallow import Schema, fields
class PersonSchema(Schema):
name = fields.Str()
age = fields.Int()
class HouseSchema(Schema):
address = fields.Str()
class FamilySchema(HouseSchema):
people = fields.Nested(PersonSchema, many=True)

As you can see:

  • FamilySchema inherits HouseSchema, which means that beside field people, it will also include field address added by HouseSchema.
  • Field people in FamilySchema nests the schema PersonSchema, signaling that this field contains an array of Person objects. You know, people = many person(s), obviously.

With nesting schemas, handling complex and nested data structures is no longer a headache.

Perform transformation before and after dump or load

marshmallow allows us to perform transformation before or after serialization and deserialization by using a number of hooks. These hooks register a method to invoke before or after deserializing or serializing an object.

  • @pre_load: before deserializing
  • @post_load: after deserializing
  • @pre_dump: before serializing
  • @post_dump: after serializing

e.g:

from marshmallow import Schema, fields, post_loadclass PersonSchema(Schema):
name = fields.Str()
age = fields.Int()
@post_load
def make_person(self, data, **__):
return Person(**data)

In the example above, we made some changes to our previously-defined PersonSchema by adding a post_load method to it. By doing this, when deserializing, we get back directly to an instance of class Person.

2.3. Why should we use Marshmallow?

Agnostic

marshmallow makes no assumption about web frameworks or database layers. It will work with just about any ORM, ODM, or no ORM at all.

Concise, familiar, and reusable syntax

marshmallow uses classes. This allows for easy code reuse and configuration. It also allows for powerful means for configuring and extending schemas, such as adding post-processing and error handling behavior.

High configurability

It’s easy and convenient to customize and configure marshmallow schemas. Customized configuration can be achieved either by passing arguments into fields, using class Meta paradigm, nesting and extending schemas, etc.

Conclusion

This article covers a really brief introduction to serialization and deserialization in Python, as well as a quick tutorial on marshmallow, one of the current best tools for these jobs. This tutorial is just a glimpse of what marshmallow is capable of, so please refer to the official documentation for more advanced usage. Happy coding!

--

--