Summary
When RLTest is invoked with --env oss-cluster --use-slaves, the slave nodes are started with --slaveof <master_port> and WITHOUT --cluster-enabled yes. This makes the slaves standalone replicas, not cluster gossip members.
Impact
CLUSTER SLOTS returns empty replica arrays from each master
CLUSTER NODES shows only the masters
- Any test that asserts cluster-aware replica routing (e.g., memtier_benchmark's
--read-preference=secondary tests) silently skips because no replica connections can be discovered through the cluster API
- RLTest counts the skips as passes, so the test cell looks green
- Real regressions in cluster-aware replica code paths cannot be caught by CI
Root cause
RLTest/redis_std.py line 225: slaves are unconditionally started with --slaveof localhost <master_port>
RLTest/redis_std.py line 230: if self.clusterEnabled and role is not SLAVE: — slaves never get --cluster-enabled yes
RLTest/redis_cluster.py startEnv (line 125+): only masters are configured with CLUSTER MEET + ADDSLOTS; slaves are never joined to the cluster
Repro
python -m RLTest --env oss-cluster --shards-count 3 --use-slaves --module ./dummy.so
# In a second terminal while the cluster is up:
redis-cli -p <master_port> CLUSTER SLOTS
# -> Each shard's reply has an EMPTY replica array
redis-cli -p <master_port> CLUSTER NODES
# -> Only the 3 masters, slaves not listed
redis-cli -p <slave_port> ROLE
# -> 'slave' with master = <master_port> (standalone replication)
redis-cli -p <slave_port> CLUSTER INFO
# -> cluster_enabled:0
Proposed fix
-
In redis_std.py: when clusterEnabled and role == SLAVE:
- Omit the
--slaveof flag (slaves will be attached post-startup via CLUSTER REPLICATE)
- Add
--cluster-enabled yes and a unique --cluster-config-file
-
In redis_cluster.py startEnv: after master slot assignment is complete:
- For each slave, send
CLUSTER MEET from a master so the slave joins gossip
- Wait for the slave to recognize the cluster
- On the slave's connection, run
CLUSTER REPLICATE <master_node_id>
This makes --use-slaves produce real cluster-aware replicas, so CLUSTER SLOTS returns populated replica arrays and tests can route to them.
Downstream context
Surfaced during PR redis/memtier_benchmark#456 round-17 review. memtier added a --read-preference feature with replica routing; the new CI matrix cell (OSS-CLUSTER + replicas: read-preference) silently skipped all tests because of this RLTest gap. Production code was verified empirically via offline runs against redis-cli --cluster create clusters. This issue closes the gap so the in-tree CI signal becomes load-bearing.
Summary
When RLTest is invoked with
--env oss-cluster --use-slaves, the slave nodes are started with--slaveof <master_port>and WITHOUT--cluster-enabled yes. This makes the slaves standalone replicas, not cluster gossip members.Impact
CLUSTER SLOTSreturns empty replica arrays from each masterCLUSTER NODESshows only the masters--read-preference=secondarytests) silently skips because no replica connections can be discovered through the cluster APIRoot cause
RLTest/redis_std.pyline 225: slaves are unconditionally started with--slaveof localhost <master_port>RLTest/redis_std.pyline 230:if self.clusterEnabled and role is not SLAVE:— slaves never get--cluster-enabled yesRLTest/redis_cluster.pystartEnv(line 125+): only masters are configured with CLUSTER MEET + ADDSLOTS; slaves are never joined to the clusterRepro
Proposed fix
In
redis_std.py: whenclusterEnabled and role == SLAVE:--slaveofflag (slaves will be attached post-startup viaCLUSTER REPLICATE)--cluster-enabled yesand a unique--cluster-config-fileIn
redis_cluster.pystartEnv: after master slot assignment is complete:CLUSTER MEETfrom a master so the slave joins gossipCLUSTER REPLICATE <master_node_id>This makes
--use-slavesproduce real cluster-aware replicas, soCLUSTER SLOTSreturns populated replica arrays and tests can route to them.Downstream context
Surfaced during PR redis/memtier_benchmark#456 round-17 review. memtier added a
--read-preferencefeature with replica routing; the new CI matrix cell (OSS-CLUSTER + replicas: read-preference) silently skipped all tests because of this RLTest gap. Production code was verified empirically via offline runs againstredis-cli --cluster createclusters. This issue closes the gap so the in-tree CI signal becomes load-bearing.