Cross Data Center Disaster Recovery (DC-DR) for Kafka
KubeDB can run a single distributed Kafka across two data centers (DCs) so a Kafka
workload survives the loss of an entire data center. Exactly one DC is the active
write cluster at any instant. The other DC runs a self-contained standby cluster that
receives an asynchronous mirror of the active cluster’s topics. When the active DC is
lost, the single bootstrap endpoint is flipped to the standby, the standby is allowed
to take producer writes, and clients continue against identical topic names.
This page is the conceptual overview and a quick start. See also:
- DC-DR User Guide for every aspect of running in DC-DR mode (components, the naming contract, connecting, monitoring, the produce fence, switchover, failback, day-2 ops).
- DC-DR Runbook for what to do in each operational scenario.
New to KubeDB? Please start here.
Why Kafka DC-DR is its own camp
Most KubeDB engines have a single writable primary and a leader-to-leader replication stream. Postgres promotes a survivor, MongoDB elects a new primary, and the endpoint follows the writable node. Kafka has none of that.
Kafka is not a single-writer database. It is a partitioned log: each topic-partition has its own leader among that partition’s in-sync replicas (ISR), inside one cluster. There is no cluster-wide primary, and KRaft’s own Raft quorum already handles controller and ISR leadership intra-cluster. So the single-primary DR pattern (one writable leader, a leader-to-leader stream, promote the survivor) does not map. For Kafka:
- Cross-DC replication is asynchronous log mirroring, not a leader stream. Kafka’s
built-in MirrorMaker 2 (MM2) copies topics (data plus configs), consumer-group
offsets, and heartbeats from a source cluster to a target cluster. KubeDB expresses
MM2 as ordinary
ConnectClusterandConnectorobjects. There is no new replication engine to build. - The active DC is a write-endpoint routing decision, not an engine state.
Producers write to whichever cluster the clients are pointed at. DR is
active/passive: one cluster takes writes, MM2 mirrors them to the standby, and on
failover the clients are redirected. The
dr-controlplaneLease decides which cluster is the write target; a local produce fence stops producers writing to a non-active cluster. - There is no rewind and no zero-RPO. MM2 is asynchronous, so an unplanned failover loses the un-mirrored tail (bounded by MM2 lag), and a returned old-active cluster may hold records that were never mirrored. Kafka cannot rewind a log. Failback reverses the mirror direction and reconciles or accepts the un-mirrored tail as bounded loss.
So Kafka DC-DR is two independent Kafka clusters (one per Member DC), each with its own intra-DC KRaft quorum, joined by asynchronous MM2 mirroring, with the Lease choosing the write cluster and a produce fence preventing split writes.
How it works
DC-DR for Kafka rests on five rules.
- KRaft stays intra-DC. Each Member DC runs its own self-contained Kafka cluster: its own KRaft controller quorum, its own brokers, its own per-partition ISR. The KRaft quorum never crosses the DC boundary, so inter-DC latency or a partition can never flap controller election or stall ISR. There is no cross-DC Kafka voter.
- The active cluster is chosen only by the
dr-controlplaneprimary-DC Lease. A small control plane, backed by a three-site etcd quorum, publishes one Lease per failover scope. The Lease holder is the active write DC. Exactly one cluster is the write target at any instant. - Cross-DC replication is MM2, active to standby. Following the “consume from
remote, produce to local” best practice, the
ConnectClusterthat does the mirroring sits with the target (standby) cluster: itskafkaRefpoints at the standby Kafka, which also holds Connect’s internal config, offset, and status topics. It runs aMirrorSourceConnector(topic data and configs), aMirrorCheckpointConnector(consumer-group offsets, so consumers resume after a flip), and aMirrorHeartbeatConnector(liveness and lag). MM2 usesIdentityReplicationPolicyso topic names are identical on both clusters and a failover is transparent to clients. Because aConnectCluster’skafkaRefis fixed and its internal topics live on its local cluster, a singleConnectClustercannot reverse direction: KubeDB pre-provisions oneConnectClusterper DC and enables the mirror connectors only on the current standby’sConnectCluster. A failover swaps which DC’sConnectClusterhas the connectors enabled, and never enables both directions at once. - Writability is gated by the Lease and fenced locally, fail closed. A non-active cluster must refuse producer writes. The fence, driven by the primary-dc marker, denies the produce operation on the non-active cluster and fails closed: a cluster that cannot confirm it holds the Lease denies produce. This local fence, plus the etcd majority, is the split-brain guarantee. Without it a partitioned old-active cluster that still sees producers would keep accepting writes that never mirror, diverging the two logs.
- One bootstrap endpoint follows the active cluster. The single user-facing
bootstrap Service resolves to the active cluster’s brokers (the per-DC
<db>-podsbootstrap, selected by the Lease), so producers and consumers always reach the write cluster. Because MM2 usesIdentityReplicationPolicy, the same topic names exist on the standby, so after the endpoint flips clients keep working and consumers resume from the offsets theMirrorCheckpointConnectoralready translated.
Why never both directions at once?
IdentityReplicationPolicykeeps topic names identical and so loses the topic-rename loop guard the default{source}.policy relies on. If both mirror directions overlap, the same topic can ping-pong between the two clusters. A failover therefore disables the old direction’s connectors before (or atomically with) enabling the new direction’s.
Data center roles
Each DC plays one role, set on the PlacementPolicy distributionRule.role:
| Role | Holds Kafka | Purpose |
|---|---|---|
| Member | yes | A self-contained Kafka cluster with its own KRaft quorum. One Member is the active write cluster; the other is the MM2 mirror target while standby. |
| Arbiter | no | The arbiter DC. Holds only the dr-controlplane etcd member and never Kafka, because Kafka has no cross-DC voter. Supplies the tie-break etcd vote. |
The single-CR, single-endpoint model
The user creates one distributed Kafka object (with spec.distributed and a
PlacementPolicy carrying distributionRules and a failoverPolicy) and gets one
bootstrap endpoint. The operator expands the CR across three CRD kinds:
- one
Kafkacluster per Member DC, each with its own intra-DC KRaft quorum; - one
ConnectClusterper DC (eachkafkaRefpointing at its local Kafka); - the three
Connectorobjects (MirrorSourceConnector,MirrorCheckpointConnector,MirrorHeartbeatConnector) enabled on the current standby’sConnectClusterfor the active-to-standby mirror.
The single CR’s status.disasterRecovery carries the whole cross-DC view: the active
DC, each cluster’s broker health, the MM2 mirror lag, and the DR phase.
Scope. This spec targets the even two-data-DC layout (two Member DCs plus an Arbiter DC). Active/passive MM2 is inherently two-cluster, so spanning three or more data DCs (fan-out mirroring and a three-way failover) is a separate, larger design and is out of scope here.
Prerequisites
- A distributed Kafka substrate: an Open Cluster Management (OCM) hub and spoke
clusters, and flat cross-DC pod networking (KubeSlice) or external listeners.
Kafka brokers advertise in-cluster
.svclisteners, so MM2’s cross-cluster reach and the cross-DC bootstrap need routable connectivity between the clusters. Wiring the advertised listeners for cross-DC reach is part of the DC-DR setup. - The
dr-controlplaneservice and its three-site etcd quorum installed across the data centers, with adr-controlplaneagent in each spoke (DC). The third etcd member sits in the Arbiter DC. - The KubeDB Kafka operator started with the DC-DR flags (coordination kubeconfig and the operator’s local DC name).
- One consistent DC name per data center, used everywhere: the OCM spoke cluster
name, the agent
--dc-name, the LeaseholderIdentity, the markeractiveDC, the pod labelopen-cluster-management.io/cluster-name, and thePlacementPolicydistributionRule.clusterName. Keep them identical.
Deploy a DC-DR Kafka
1. PlacementPolicy
Assign global pod ordinals to data centers and tag each DC with its role. Here two
Member DCs (dc-a, dc-b) each hold a three-node Kafka cluster, and dc-c is the
Arbiter DC:
apiVersion: apps.k8s.appscode.com/v1
kind: PlacementPolicy
metadata:
name: kf-dcdr
spec:
clusterSpreadConstraint:
slice:
projectNamespace: kubeslice-demo
sliceName: demo-slice
failoverPolicy:
mode: TwoDC
trigger:
scope: Global
distributionRules:
- clusterName: dc-a
role: Member
replicaIndices: [0, 1, 2]
- clusterName: dc-b
role: Member
replicaIndices: [3, 4, 5]
- clusterName: dc-c
role: Arbiter
replicaIndices: []
- A data-bearing Member rule carries
replicaIndicesmapping its ordinals to a self-contained Kafka cluster. The Arbiter DC carries an emptyreplicaIndicesand holds no Kafka, only thedr-controlplaneetcd member. failoverPolicy.mode: TwoDCexpects two Member DCs plus the Arbiter DC.failoverPolicy.trigger.scope: Globalmakes this one cluster-wide failover scope.
2. Kafka
Reference the PlacementPolicy and opt the Kafka into DC-DR expansion:
apiVersion: kubedb.com/v1
kind: Kafka
metadata:
name: kf-dcdr
namespace: demo
spec:
version: 4.0.0
distributed: true
replicas: 6
storageType: Durable
podTemplate:
spec:
podPlacementPolicy:
name: kf-dcdr
storage:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 1Gi
deletionPolicy: WipeOut
spec.replicas: 6 is the total broker count across both Member DCs. The
PlacementPolicy replicaIndices split it into a three-node cluster in dc-a
(ordinals 0, 1, 2) and a three-node cluster in dc-b (ordinals 3, 4, 5). The operator
expands this into one self-contained Kafka cluster in dc-a and one in dc-b, a
ConnectCluster in each DC (each kafkaRef pointing at its local Kafka), and the three
MM2 Connector objects on the standby DC’s ConnectCluster mirroring the active
cluster into the standby.
Observe the DC-DR state
The single Kafka object’s status.disasterRecovery carries the whole cross-DC view:
$ kubectl get kafka -n demo kf-dcdr -o jsonpath='{.status.disasterRecovery}' | jq
{
"activeDC": "dc-a",
"phase": "Steady",
"lastTransitionTime": "2026-06-30T10:00:00Z",
"dataCenters": [
{ "clusterName": "dc-a", "role": "Member", "writable": true, "brokersReady": 3, "mirrorLagMillis": 0, "healthy": true },
{ "clusterName": "dc-b", "role": "Member", "writable": false, "brokersReady": 3, "mirrorLagMillis": 850, "healthy": true },
{ "clusterName": "dc-c", "role": "Arbiter", "writable": false, "brokersReady": 0, "mirrorLagMillis": 0, "healthy": true }
]
}
activeDCis the DC that holds the Lease and takes producer writes.phaseisSteady,FailingOver,FailingBack, orDegraded.- Each
dataCentersentry reports the DC role, whether it is the writable cluster, how many brokers are ready, its MM2 mirror lag in milliseconds (the standby’s replication latency behind the active), and its health.
Unplanned failover
When the active DC is lost, the standby is already a near-current MM2 mirror. The
orchestrator observes the Lease move to the standby, flips the bootstrap endpoint to
the standby’s brokers, opens the standby’s produce fence, and reverses the MM2
direction (disabling the connectors on the old active’s ConnectCluster if it is
reachable, and enabling them on the survivor’s for when the old DC returns).
status.disasterRecovery.phase moves to FailingOver and back to Steady.
The RPO is the un-mirrored MM2 tail: records the active cluster accepted but had not yet mirrored when it died are lost. There is no rewind.
Planned switchover (drained, zero record loss)
To move the active DC on purpose without losing records, annotate the Kafka with the target DC:
$ kubectl annotate kafka -n demo kf-dcdr dr.kubedb.com/switchover-to=dc-b
The orchestrator quiesces producers by closing the active cluster’s produce fence,
waits for MM2 to drain to near-zero lag (so the target has every record), then flips
the bootstrap endpoint, opens the target’s fence, and reverses the mirror direction.
Because MM2 has fully drained before the flip, no committed record is lost. The Lease
then follows to dc-b.
Failback
Failback is not a rewind. A returned old-active cluster becomes the MM2 target of the new active. Records it accepted but never mirrored before the failover are a forked tail Kafka cannot rewind, and because MM2 only adds and never deletes, a naive re-mirror leaves those orphan records on top of the new active’s data. For correctness, re-seed the affected topics from the new active (wipe and re-mirror) or accept and document the orphan tail as bounded loss. Once the returned DC is caught up, a drained planned switchover returns the active DC.
Cleanup
$ kubectl delete kafka -n demo kf-dcdr
$ kubectl delete placementpolicy kf-dcdr
Deleting the Kafka removes the per-DC Kafka clusters, the per-DC ConnectCluster
objects, the MM2 Connector objects, and the generated cluster-scoped per-DC
PlacementPolicies (which carry no owner reference, so the operator deletes them
explicitly). The user-provided base PlacementPolicy is left for you to delete.































