Understanding Redis® High-Availability Architectures

Updated on June 20, 2024
Understanding Redis® High-Availability Architectures header image

Introduction

Redis® is an open-source, in-memory data structure store. Its core data types include String, List, Hash, Set, Sorted Set Geospatial indexes, HyperLogLog and Bitmap. You can use Redis® as a messaging system, thanks to its support for Redis® Streams, Pub/Sub and List (which can act as a queue).

High-Availability is crucial in Redis® to ensure the continuous availability of data and prevent downtime. Although Redis® is known for its high performance, there are instances where a single server might not be enough to meet the data requirements of an organization. Running Redis® across multiple servers provides a resilient architecture where the system can continue to operate without interruption and data loss in-spite of one or more server failures. This also has an added benefit of enhanced performance.

This article covers Redis® High-Availability (HA) strategies. You will start by learning about the Primary-Replica architecture and how to use it with Vultr Managed Database for Caching. You will then learn about Redis® Cluster and how it can be used to scale Redis® horizontally, followed by Redis® Sentinel and Proxy based HA architectures. Finally, you will learn about the pros and cons of each architecture and how to choose the optimal one for your use case.

Primary-Replica Architecture

A Primary-Replica architecture comprises multiple Redis® nodes, where one node is designated as the Leader (also known as the Primary), while the others function as Followers (or Replicas). This replication mechanism in Redis® is the cornerstone of operating Redis® in a distributed and highly-available manner. By default, replicas are configured to be read-only, which means they reject all write commands. This design ensures that the data on the replicas remains consistent with that of the primary node.

The primary node continuously replicates data to the replica nodes, ensuring that they are exact copies of the primary node. One of the advantages of the asynchronous replication process is that the primary node does not have to wait for a command to be processed by the replicas, ensuring low latency and high performance. In addition, the primary node tracks which replica has processed which command, as the replicas periodically acknowledge the amount of data they have received.

Replication serves a dual purpose of improving data safety and high availability, and facilitating scalability through the use of multiple replicas for read-only queries. Slow operations can be offloaded to replicas, improving the performance of the system. By leveraging replication, Redis® can operate as a reliable and scalable distributed database.

Using Primary-Replica HA architecture with a Vultr Managed Database for Caching

This architecture supports Vultr Managed Databases for Caching where you can have a Redis® deployment with a single primary node and multiple replica nodes. To set up a highly available primary-replica setup with a Vultr Managed Database for Caching, select one or more replica nodes during the cluster creation process. You can also increase the number of replica nodes after instance creation. For example, to implement this architecture, follow the steps below.

  1. Install the Redis® CLI on your computer. On Ubuntu/Debian, run the following command.

     $ sudo apt install redis-tools
  2. To connect to a Vultr Managed Database for Caching, paste your instance connection string similar to the one below.

     $ redis-cli -u rediss://default:AVNS_w9_i6db@vultr-prod-f0-c7fc-4fcf42-vultr-prod-8c01.vultrdb.com:16752
  3. When connected, use the INFO command to view the instance details.

     > INFO replication

    Your output should look like the one below.

     #Replication
     role:master
     connected_slaves:1
     slave0:ip=fda7:a938:5bfe:5fa6:0:443:52e5:16d0,port=16751,state=online,offset=3260,lag=0
     master_failover_state:no-failover
     master_replid:290f39e487bc2f6fd5cfc16ba632feaff63ed949
     master_replid2:0000000000000000000000000000000000000000
     master_repl_offset:3260
     second_repl_offset:-1
     repl_backlog_active:1
     repl_backlog_size:117440512
     repl_backlog_first_byte_offset:1
     repl_backlog_histlen:3260

    The role field means that the node is a primary, while the connected_slaves field means that there is one replica node connected to the primary node.

  4. Add sample data to the database.

     $ SET foo bar
     $ SET john doe

    When you execute the INFO replication command again, you will notice that the master_repl_offset field has changed. This means that data has been replicated to the replica node.

     #Replication
     role:master
     connected_slaves:1
     slave0:ip=fda7:a938:5bfe:5fa6:0:443:52e5:16d0,port=16751,state=online,offset=3575,lag=1
     master_failover_state:no-failover
     master_replid:290f39e487bc2f6fd5cfc16ba632feaff63ed949
     master_replid2:0000000000000000000000000000000000000000
     master_repl_offset:3575
     second_repl_offset:-1
     repl_backlog_active:1
     repl_backlog_size:117440512
     repl_backlog_first_byte_offset:1
     repl_backlog_histlen:3575

Redis® Cluster

A limitation of Redis® is that the amount of data that can be accommodated is dependent on the available memory of the host machine. To overcome this limitation, data has to be distributed across multiple Redis® servers. Redis® Cluster is an advanced feature that builds on top of the Leader-Follower replication mechanism to enable horizontal scalability. A Redis® Cluster comprises of multiple shards, each of which can have a primary node and zero or more replica nodes. If you scale out the number of shards in a cluster, data is automatically partitioned and distributed across the primary Redis® nodes in a process known as Sharding.

In Redis® sharding implementation, every key is part of a logical Hash slot. A Redis® Cluster is divided into 16384 hash slots, and a deterministic formula is used to derive the key to hash slot mapping, that is [ CRC16(key) mod 16384 ]. There are often scenarios where your application might need to operate on multiple keys at once (multi-key operations) - for example, deleting many keys at once. Because Redis® Cluster shards/distributes data across multiple nodes, these keys might not be present in the hash slot.

If you operate on multiple keys that belong to different hash slots, the operation will fail with a CROSSSLOT error. To overcome this, Redis® Cluster provides Hash Tags - you can use curly braces {} to specify the part of the key that will be hashed. This allows users to control the key to hash slot mapping and make the process deterministic.

Redis® Cluster provides automatic data distribution across multiple nodes, enabling users to scale beyond a single server. In addition, it ensures that operations can continue even when a subset of the nodes experiences failures or is unable to communicate with the rest of the cluster. This feature provides increased reliability and fault tolerance for Redis® users.

Redis® Sentinel

Redis® Sentinel is a distributed system designed to run in a configuration where there are multiple Sentinel processes cooperating together. Redis® Sentinel uses a quorum-based approach to ensure that the failover decision is made by a majority of Sentinels, preventing the risk of split-brain scenarios. It also supports configuration updates, automatic node discovery, and can notify administrators of important events.

It constantly checks if primary and replica instances are working as expected. If a primary instance is not working as expected, Sentinel can start a failover process where a replica is promoted to primary, the other additional replicas are reconfigured to use the new primary. Sentinel also acts as a configuration provider for service discovery, clients can connect to Sentinels to ask for the address of the primary Redis® node responsible for a given service. If a failover occurs, Sentinels will report the new address.

A typical sentinel setup consists of three nodes, with each running both a Redis® process and a Sentinel process. If the primary fails, the other sentinels will agree about the failure and will be able to trigger a failover (based on quorum).

Proxy Architecture

A proxy based HA architecture sits between Redis® Cluster and Redis® Sentinel. Like Sentinel, it relies on external component to front-end a fleet of Redis® servers. It also takes care of data partitioning using custom schemes, just like Redis® Cluster does.

A Server-side proxy consists of an intermediate server that speaks the Redis® protocol and fans out the request to the appropriate Redis® server from a fleet of servers.

A popular server-side proxy solution is twemproxy. It's built primarily to reduce the number of connections to the backend caching servers. This, together with protocol pipelining and sharding allows you to horizontally scale your distributed caching architecture.

Which Architecture is Right for You?

Every architectural approach has its pros and cons as described in this section.

Leader-Follower Topology

Pros

Cons

  • Because replica nodes are read-only, it can only be used to scale reads. If your application is write-heavy, consider other options.
  • Lack of data partitioning - the primary node is the single source of truth for all the data.

Choose the Leader-Follower topology when:

  • You have a read-heavy application.
  • You cannot use a Redis® Cluster (for example, because of multi-key operations constraints).

Redis® Cluster

Pros

  • Data is automatically partitioned across the cluster.
  • You can scale both reads (add more replicas in a shard) and writes (add more shards).

Cons

  • Requires Redis® Cluster aware client.
  • You need to design your application with multi-key operation constraints in mind.
  • For fine-grained control, you need to modify the application to use client-side sharding with hashtags.
  • Complex architecture - Running a Redis® Cluster at large scale introduces additional operational complexity and large infrastructure requirements.

Choose Redis® Cluster when:

  • You need the ability to scale both reads and writes.
  • The client application programming language has a battle-tested, well maintained Redis® Cluster client.
  • You can tolerate architectural complexity for additional functionality.

Redis® Sentinel

Pros

  • It provides automatic failover.
  • Acts as a configuration provider for service discovery, clients don't need to be aware of the Redis® topology.

Cons

  • Complex architecture - need to support extra fleet of servers.
  • You need a specialized (Redis® Sentinel aware) client.

Choose Redis® Sentinel when:

  • You need automatic failover, but cannot use a Redis® Cluster.
  • You can tolerate architectural complexity for additional functionality. If not, stick to primary-replica based architecture.

Proxy Based Solution

Pros

  • You get load balancing, data partitioning with additional benefit of connection management.
  • Don't require specialized client (Cluster or Sentinel aware) - so you can continue to use standalone Redis® client.

Cons

  • Complex architecture - need to support extra fleet of servers.
  • Need to resort to third-party components which is not part of the standard Redis® tooling (like Redis® Cluster or Sentinel).

Choose the Proxy based solution when:

  • You are using Redis® as a cache and don't need automatic failover.
  • You want to enhance your Primary-Replica setup, but don't want to use Redis® Cluster or Sentinel.

Conclusion

In this article, you have covered different high-availability architectures for Redis®. All these solutions are powerful but need significant investment when it comes to set up and maintenance. This can be mitigated by adopting managed solutions like Vultr Managed Databases for Caching, which offer automated high-availability mechanisms, and reduce operational overhead by handling tasks such as tuning configurations, updates, and patching.