Skip to content

High Availability Deployment Across Data Centers

Scenario Requirements

The customer's data center environment consists of a single Kubernetes (k8s) cluster spanning Data Center A and Data Center B. They want to deploy a 3-master 3-slave Redis cluster to achieve high availability across the data centers. They expect Redis to continue providing services even when one data center is offline.

svc

Solution

To meet the high availability requirements across the two data centers, the Redis replicas need to be deployed in the following manner:

  • 3 leader replicas running on cluster nodes: k8s-node-01, k8s-node-02, k8s-node-06
  • 3 follower replicas running on cluster nodes: k8s-node-03, k8s-node-04, k8s-node-05
  • Ensure that each cluster node runs only one Redis replica

This solution uses workload scheduling strategies to achieve the deployment goals through weighted node affinity and workload anti-affinity policies.

Note

Please ensure that each node has sufficient resources to avoid scheduler failure due to resource shortage.

1. Label Configuration

Redis Workload Labels

To schedule the leader and follower replicas separately, the Redis replicas are divided using labels:

Redis Replica Label
redis-leader app:redis-leader
redis-follower app:redis-follower

Cluster Node Labels

To allocate the leader and follower replicas to two different data centers, two topological domains need to be defined among the 6 cluster nodes. The labels for each cluster node are as follows:

Cluster Node Label Topological Domain
k8s-node-01 az1:node01 az1
k8s-node-02 az1:node02 az1
k8s-node-06 az1:node03 az1
k8s-node-04 az2:node01 az2
k8s-node-05 az2:node02 az2
k8s-node-03 az2:node03 az2

2. Scheduling Configuration

The redis-leader and redis-follower replicas need to be scheduled in different topological domains, so separate affinity policies need to be configured as follows:

redis-leader

# Apply workload anti-affinity to the cluster nodes (k8s-node-01, k8s-node-02, k8s-node-06) within the topological domain __az1__, ensuring that only one leader replica is scheduled per cluster node.
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values:
                  - redis-leader
          topologyKey: az1

# Apply node affinity scheduling for the replicas of redis-leader within the topological domain __az1__ on cluster nodes (k8s-node-01, k8s-node-02, k8s-node-06)
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          preference:
            matchExpressions:
              - key: az1
                operator: In
                values:
                  - node01
        - weight: 90
          preference:
            matchExpressions:
              - key: az1
                operator: In
                values:
                  - node02
        - weight: 80
          preference:
            matchExpressions:
              - key: az1
                operator: In
                values:
                  - node03

redis-follower

# Apply workload anti-affinity to the cluster nodes (k8s-node-03, k8s-node-04, k8s-node-05) within the topological domain __az2__ for the replicas of redis-follower
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values:
                  - redis-follower
          topologyKey: az2

# Apply node affinity scheduling for the replicas of redis-follower within the topological domain __az2__ on cluster nodes (k8s-node-03, k8s-node-04, k8s-node-05)
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          preference:
            matchExpressions:
              - key: az2
                operator: In
                values:
                  - node01
        - weight: 90
          preference:
            matchExpressions:
              - key: az2
                operator: In
                values:
                  - node02
        - weight: 80
          preference:
            matchExpressions:
              - key: az2
                operator: In
                values:
                  - node03

Handling Data Center Offline

Data Center A Offline

When Data Center A is offline, two Redis-leader replicas will go offline, and the entire Redis cluster will be unable to provide normal services, as shown in the following diagram:

sync

Solution

Use the redis-cli tool to connect to any redis-follower replica in Data Center B and manually convert it to a leader replica.

# Connect to a follower node
redis-cli -h <ip> -p <port>
# Password authentication, the password can be found in the instance overview page of the middleware module
auth <password>
# Perform role conversion for the node
cluster failover takeover
# Check the role information of the node, it should have changed
role

After the role conversion of a replica in Data Center B, the cluster can resume its service capability. When Data Center A comes back online, the original redis-leader replica will join the Redis instance as a follower.

Data Center B Offline

When Data Center B is offline, only one redis-leader replica will go offline, and the Redis service will not be interrupted. No manual intervention is required, as shown in the following diagram:

sync

Comments