Accelerate Distributed Applications with Memcached

Use cache to reduce resource access time

Caching is the operation of storing known data somewhere for reduced retrieval time in possible future usage. In this article, let’s familiarize ourselves with Memcached, a high-performance, distributed caching system currently used to speed up distributed web applications.

I. Caching in Distributed Systems

Caching is a pivotal aspect to consider when writing applications, especially large-scale ones that deal with millions of real-world users. Web applications involve a lot of time-consuming actions, such as data computation or database queries, and without proper caching, the user experience could be significantly hampered by lengthy loading bars.

Most programming languages offer built-in capabilities for caching, ranging from a single dictionary/hash table to complete and complex data structures dedicated to building cache such as functools.lru_cache in Python. This built-in function, when used as a decorator to another function, will reduce the execution time of the decorated one by using the memoization technique. LRU in the function’s name stands for Least-Recently Used algorithm, in which the least recently used item currently cached is discarded to give space for a new item when a cache miss occurs.

However, those data structures are limited and domestic to the language process. In distributed systems, where several replications of your application run across a large platform, these in-memory data structures prevent sharing cached data. You just can’t throw a caching dictionary from microservice A to microservice B. Therefore, if a system is distributed across a network, it also requires the caching system to function in the same way.

This is where distributed memory object caching systems like Redis and Memcached enter the game.

Caching helps save database query time. Source: [3]

II. Memcached

1. Introduction

Memcached is a high-performance, distributed caching system. Although application-neutral, it’s most commonly used to speed up dynamic Web applications by alleviating database load [1]. Launched by Brad Fitzpatrick in 2004, Memcached soon attracted and got adopted by popular sites, such as Wikipedia, Twitter, YouTube, and most notably, Facebook, who has improved the open-source version of Memcached and used it as a building block to construct a data store for the largest social network in the world [2].

Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering. Memcached runs on one or more server instances completely separated from the application, with each server listens on a user-defined address and works as an independent data store.

2. Memcached mechanism

Left: Read path for a cache miss. Right: Write path. Source: [3]

From users’ point of view, Memcached isn’t much different from a dictionary mapping keys to values, which are essential data to be cached. To them, how Memcached server(s) store data isn’t a concern, as the library did an absolutely good job of abstraction through a Memcached client which exposes all cache operations (get , set , …) through an interface. However, underneath the hood, Memcached implementation looks more like a hash table with each Memcached server representing a bucket. For a given key, Memcached will perform a hash function to figure out which bucket is responsible for handling that particular key, to spread out keys evenly across all servers.

Memcached client receives user requests and sends them to the correct server. Source: [1]

3. Memcached in Action

Ah, my favorite part. Concepts and theories ain’t fun without examples, are they?

3.1. Install and launch

  • On Linux, installing Memcached is simple:
apt-get install memcached
  • For Mac users, the easiest way is possibly through Homebrew; after you install Homebrew, run:
brew install memcached
  • On Windows, the story gets a little bit more complicated as you would have to install it from the source or find pre-compiled binaries. Further instructions can be found here.

After successfully installing Memcached, you need to launch it:

$ memcached

However, as mentioned above, to store and retrieve data from Memcached, you have to interact with a client, which will receive your requests, and figure out which Memcached server to proxy to. As popular as Memcached is, there should be at least one client available for the programming language of your choice.

3.2. Install and set up Memcached client

For Python, one of the most well-known open-source Memcached clients is python-memcached , which can be installed with pip :

pip install python-memcached

Once finish installing the library, we need to set up a client with configurations:

from memcache import Clientmemcache_client = Client(servers=['127.0.0.1:11211'])

Here we pass into Client object initialization parameter servers , which is a list of addresses that we want our Memcached server(s) to run at.

3.3. Store and retrieve values from cache

For peace of mind, let’s think of Memcached as a gigantic network-based dictionary. There are a few rules to remember when playing with it:

  • Memcached only supports the String and Integer data types, which means your keys and values must be pre-serialized before storing
  • Memcached has limits for the size of keys and values: at most 250 bytes for a key and 1MB for a value
  • Every key-value pair has an expiration time, after which it will be deleted

To assign a value to a key and retrieve the value of a key, we use set and get respectively:

memcache_client.set('key_1', 'value_1', 60)

What this line does is that it assigns the value value_1 to key key_1 , with a cache expiration time of 60 seconds. Now, let’s retrieve the data we just cached:

memcache_client.get('key_1')
# 'value_1'

Inside Memcached, all operations are O(1) [1].

3.4. Extended functions

  • Cache invalidation

There are only two hard things in Computer Science: cache invalidation and naming things. — Phil Karlton

Luckily, Memcached offers easy and safe solutions to invalidate the cache. We can directly do that by using delete similarly to get or set , and also by setting an expiration time as mentioned above.

memcache_client.delete('key_1')
  • Check and Set

Let me explain this by borrowing an example on race condition from David Malan’s CS50 class at Harvard.

Suppose you come back to your room and see that there is no milk left in the fridge, so you head out to buy a can. Just after you leave, your roommate comes back, finds out the same thing, and also goes and buys one. When coming back from the store (suppose you two don’t meet on the way), you both are surprised that there are now 2 cans of milk instead of 1, and milk doesn’t last that long, which is problematic.

This situation is called a race condition, where two or more processes try to update (set) a value at the same time without being aware of each other. To tackle this issue, Memcached introduced Check and Set (CAS) operation, in which it only sets a key to a given value if it hasn’t been altered since last fetched. Let’s implement the milk scenario using python-memcached :

def buy_milk(memcache_client):
milk_count = memcache_client.gets('milk')
if not milk_count:
memcache_client.cas('milk', 1)
else:
return 'Milk already bought'

To use the Check and Set feature, first, we retrieve the number of milk cans left using gets (with an s to signal that we are using this in conjunction with Check and Set). If there is no milk left, we set the number of milk to 1 using cas operation, else return that milk is already bought.

This operation is useful for web applications to handle concurrency, where several clients might try to access and update the same key at the same time.

Conclusion

This article covers an introduction to cache as well as a tutorial on Memcached, a useful tool for caching on distributed systems. This tutorial is just a glimpse of the diverse capabilities of Memcached, so please refer to the official site for more advanced usage. Happy coding!

References

[1] Brad Fitzpatrick. 2004. Distributed Caching with Memcached. Linux Journal, Vol. 2004, 124 (Aug. 2004). http://dl.acm.org/citation.cfm?id=1012889.1012894

[2] Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. Scaling Memcache at Facebook. In Proceedings of the 10th Symposium on Networked Systems Design and Implementation (NSDI), pages 385–398, 2013.

[3] Julien Danjou. Python + Memcached: Efficient Caching in Distributed Applications. Real Python. https://realpython.com/python-memcache-efficient-caching/.

CS & Math @ Conn