Skip to content

oss-cluster + --use-slaves creates standalone replicas, not cluster-aware ones #252

@fcostaoliveira

Description

@fcostaoliveira

Summary

When RLTest is invoked with --env oss-cluster --use-slaves, the slave nodes are started with --slaveof <master_port> and WITHOUT --cluster-enabled yes. This makes the slaves standalone replicas, not cluster gossip members.

Impact

  • CLUSTER SLOTS returns empty replica arrays from each master
  • CLUSTER NODES shows only the masters
  • Any test that asserts cluster-aware replica routing (e.g., memtier_benchmark's --read-preference=secondary tests) silently skips because no replica connections can be discovered through the cluster API
  • RLTest counts the skips as passes, so the test cell looks green
  • Real regressions in cluster-aware replica code paths cannot be caught by CI

Root cause

RLTest/redis_std.py line 225: slaves are unconditionally started with --slaveof localhost <master_port>
RLTest/redis_std.py line 230: if self.clusterEnabled and role is not SLAVE: — slaves never get --cluster-enabled yes
RLTest/redis_cluster.py startEnv (line 125+): only masters are configured with CLUSTER MEET + ADDSLOTS; slaves are never joined to the cluster

Repro

python -m RLTest --env oss-cluster --shards-count 3 --use-slaves --module ./dummy.so
# In a second terminal while the cluster is up:
redis-cli -p <master_port> CLUSTER SLOTS
# -> Each shard's reply has an EMPTY replica array
redis-cli -p <master_port> CLUSTER NODES
# -> Only the 3 masters, slaves not listed
redis-cli -p <slave_port> ROLE
# -> 'slave' with master = <master_port>  (standalone replication)
redis-cli -p <slave_port> CLUSTER INFO
# -> cluster_enabled:0

Proposed fix

  1. In redis_std.py: when clusterEnabled and role == SLAVE:

    • Omit the --slaveof flag (slaves will be attached post-startup via CLUSTER REPLICATE)
    • Add --cluster-enabled yes and a unique --cluster-config-file
  2. In redis_cluster.py startEnv: after master slot assignment is complete:

    • For each slave, send CLUSTER MEET from a master so the slave joins gossip
    • Wait for the slave to recognize the cluster
    • On the slave's connection, run CLUSTER REPLICATE <master_node_id>

This makes --use-slaves produce real cluster-aware replicas, so CLUSTER SLOTS returns populated replica arrays and tests can route to them.

Downstream context

Surfaced during PR redis/memtier_benchmark#456 round-17 review. memtier added a --read-preference feature with replica routing; the new CI matrix cell (OSS-CLUSTER + replicas: read-preference) silently skipped all tests because of this RLTest gap. Production code was verified empirically via offline runs against redis-cli --cluster create clusters. This issue closes the gap so the in-tree CI signal becomes load-bearing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions