Introduce atomic slot migration#1949
Merged
madolson merged 125 commits intoAug 12, 2025
Merged
Conversation
1. Define new structure slotRange and clusterSlotSyncLink; 2. Add CLUSTER SLOTLINK command to manage all the slot sync links. Signed-off-by: Binbin <binloveplay1314@qq.com>
1. Add CLUSTER SLOTSYNC/SLOTSYNCFORCE command to trigger slot sync. Signed-off-by: Binbin <binloveplay1314@qq.com>
1. Extend the SYNC command, let it specify the slot ranges; 2. Enable to filter the keys in the specified slots when generate rdb; 3. Implement the handshake process before rdb transfer for slot sync.f Signed-off-by: Binbin <binloveplay1314@qq.com>
1. Implement the rdb transfer and loading for slot sync. Signed-off-by: Binbin <binloveplay1314@qq.com>
1. Enable to filter the cmds in the specified slots when feed slaves; 2. Implement the messages exchange for slot sync; 3. Add clusterSlotSyncCron() to handle time events for slot sync. Signed-off-by: Binbin <binloveplay1314@qq.com>
1. Add CLUSTER FAILOVER command to trigger slot failover; 2. Implement the process of slot failover. Signed-off-by: Binbin <binloveplay1314@qq.com>
1. Improve the delDbKeysInSlot() to support time limit; 2. Implement the slot pending delete. Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
…ot RDBs In slot sync, we can load multiple slot RDBs from different nodes. slot RDBs will contain functions, which may encounter an `already exists` error when loading since every node will have the function data. This is a limitation in ours slot RDB design. This commit added a new RDBFLAGS_SLOT_SYNC flag, it is a hint that means we are targeting slot rdb. With this flag, when loading the function, we will treat it like FUNCTION LOAD REPLACE, so we won't have errors and the function reached last will win (which won't be a problem at the moment since all the function is the same in cluster). Signed-off-by: Binbin <binloveplay1314@qq.com>
We have three nodes, A and B doing the slot sync, A is the src node, and B is the dst node. Before we doing the CLUSTER SLOTFAILOVER, we adds a new C, and C is the replica of B. In this time, if the connection between A and B breaks, in B's views, when doing freeClient, it will call onSlotSyncClientClose to reset the link, and then B will try to do a new slot SYNC in cron. And then B will call delkeysNotOwnedByMySelf to delete all the keys in that slot, which will propagate to C and C will becoma a empty DB. When B doing the new slot SYNC, A will generate a new slot RDB, and B will re-load the new slot RDB. But this new RDB file will not be propagated to node C because the master-replica connection between B and C is normal, which will result a data loss in C. In this commit, after the master node loads the slot RDB, we need to disconnect all slave nodes and allow the slave nodes to fully synchronize (we don't support slot psync). Signed-off-by: Binbin <binloveplay1314@qq.com>
In SET command, the expire time will be rewrite to a new
robj, which mostly is an INT:
```
/* Propagate as SET Key Value PXAT millisecond-timestamp if there is
* EX/PX/EXAT/PXAT flag. */
robj *milliseconds_obj = createStringObjectFromLongLong(milliseconds);
rewriteClientCommandVector(c, 5, shared.set, key, val, shared.pxat, milliseconds_obj);
```
And then if we are doing resharding, the server will crash.
That is because we will call isCommandInSlotRanges to check
the slot, and we will call getKeysFromCommand to get the key
from command argv. The milliseconds_obj is an INT and the code
in setGetKeys treat it as a STRING, so the server crash.
In this fix, we decided that in isCommandInSlotRanges, if we
find an INT, we decode it to a STRING.
In addition, we added a new debug enable-debug-assert so that
we can try to cover the isCommandInSlotRanges function in ours
TCL tests.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Let's take expire as an example. If the target node deleted an expired key or the key is logic expires during the process of the loading of the slot RDB, the source node may propagate the expire command after loading the slot RDB. These expire commands will become invalid on the logically expired keys, resulting in data loss. During the slot migration process, the keys are not considered expired in the expiration check of the command propagated by the source node. Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
The reply will mess up the slot replication. Signed-off-by: Binbin <binloveplay1314@qq.com>
This causes the source node to not trim the reply, causing the querybuf of the target node to be too large, resulting in the target node being disconnected dut to the querybuf limit (1G). Signed-off-by: Binbin <binloveplay1314@qq.com>
We need to make sure this block is only executed in the source node, otherwise targe node call it will reset the slot failover. Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
…flags Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
This can happend when multiple targets doing slot sync at the same time. When doing disk-based slot RDB replication, this replicationSetupReplicaForFullResync call here may result in sending the slot RDB to a different target, or to a normal replica, or sending a normal RDB to a target. Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
rjd15372
pushed a commit
that referenced
this pull request
Sep 23, 2025
If all cluster nodes have functions, slot migration will fail since the target will return the function already exists error when doing the FUNCTION LOAD. And in addition, the target's replica could panic when it executes the FUNCTION LOAD propagated from the primary (see propagation-error-behavior). Introduced in #1949. Signed-off-by: Binbin <binloveplay1314@qq.com>
rjd15372
pushed a commit
that referenced
this pull request
Sep 23, 2025
…ous reading of auth response (#2494) The old SLOT_EXPORT_AUTHENTICATING added in #1949, when processed by the source node, we will send the AUTH command and then reads the response. If the target node is blocked during this process, the source node will also be blocked. We should use a read handler to handle this. We split SLOT_EXPORT_AUTHENTICATING into SLOT_EXPORT_SEND_AUTH and SLOT_EXPORT_READ_AUTH_RESPONSE to avoid this issue. Signed-off-by: Binbin <binloveplay1314@qq.com>
rjd15372
pushed a commit
that referenced
this pull request
Sep 23, 2025
When we adding atomic slot migration in #1949, we reused a lot of rdb save code, it was an easier way to implement ASM in the first time, but it comes with some side effect. Like we are using CHILD_TYPE_RDB to do the fork, we use rdb.c/rdb.h function to save the snapshot, these mess up the logs (we will print some logs saying we are doing RDB stuff) and mess up the info fields (we will say we are rdb_bgsave_in_progress but actually we are doing slot migration). In addition, it makes the code difficult to maintain. The rdb_save method uses a lot of rdb_* variables, but we are actually doing slot migration. If we want to support one fork with multiple target nodes, we need to rewrite these code for a better cleanup. Note that the changes to rdb.c/rdb.h are reverting previous changes from when we was reusing this code for slot migration. The slot migration snapshot logic is similar to the previous diskless replication. We use pipe to transfer the snapshot data from the child process to the parent process. Interface changes: - New slot_migration_fork_in_progress info field. - New cow_size field in CLUSTER GETSLOTMIGRATIONS command. - Also add slot migration fork to the cluster class trace latency. Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Jacob Murphy <jkmurphy@google.com> Co-authored-by: Jacob Murphy <jkmurphy@google.com>
hpatro
pushed a commit
to hpatro/valkey
that referenced
this pull request
Oct 3, 2025
Introduces a new family of commands for migrating slots via replication. The procedure is driven by the source node which pushes an AOF formatted snapshot of the slots to the target, followed by a replication stream of changes on that slot (a la manual failover). This solution is an adaptation of the solution provided by @enjoy-binbin, combined with the solution I previously posted at valkey-io#1591, modified to meet the designs we had outlined in valkey-io#23. ## New commands * `CLUSTER MIGRATESLOTS SLOTSRANGE start end [start end]... NODE node-id`: Begin sending the slot via replication to the target. Multiple targets can be specified by repeating `SLOTSRANGE ... NODE ...` * `CLUSTER CANCELMIGRATION ALL`: Cancel all slot migrations * `CLUSTER GETSLOTMIGRATIONS`: See a recent log of migrations This PR only implements "one shot" semantics with an asynchronous model. Later, "two phase" (e.g. slot level replicate/failover commands) can be added with the same core. ## Slot migration jobs Introduces the concept of a slot migration job. While active, a job tracks a connection created by the source to the target over which the contents of the slots are sent. This connection is used for control messages as well as replicated slot data. Each job is given a 40 character random name to help uniquely identify it. All jobs, including those that finished recently, can be observed using the `CLUSTER GETSLOTMIGRATIONS` command. ## Replication * Since the snapshot uses AOF, the snapshot can be replayed verbatim to any replicas of the target node. * We use the same proxying mechanism used for chaining replication to copy the content sent by the source node directly to the replica nodes. ## `CLUSTER SYNCSLOTS` To coordinate the state machine transitions across the two nodes, a new command is added, `CLUSTER SYNCSLOTS`, that performs this control flow. Each end of the slot migration connection is expected to install a read handler in order to handle `CLUSTER SYNCSLOTS` commands: * `ESTABLISH`: Begins a slot migration. Provides slot migration information to the target and authorizes the connection to write to unowned slots. * `SNAPSHOT-EOF`: appended to the end of the snapshot to signal that the snapshot is done being written to the target. * `PAUSE`: informs the source node to pause whenever it gets the opportunity * `PAUSED`: added to the end of the client output buffer when the pause is performed. The pause is only performed after the buffer shrinks below a configurable size * `REQUEST-FAILOVER`: request the source to either grant or deny a failover for the slot migration. The grant is only granted if the target is still paused. Once a failover is granted, the paused is refreshed for a short duration * `FAILOVER-GRANTED`: sent to the target to inform that REQUEST-FAILOVER is granted * `ACK`: heartbeat command used to ensure liveness ## Interactions with other commands * FLUSHDB on the source node (which flushes the migrating slot) will result in the source dropping the connection, which will flush the slot on the target and reset the state machine back to the beginning. The subsequent retry should very quickly succeed (it is now empty) * FLUSHDB on the target will fail the slot migration. We can iterate with better handling, but for now it is expected that the operator would retry. * Genearlly, FLUSHDB is expected to be executed cluster wide, so preserving partially migrated slots doesn't make much sense * SCAN and KEYS are filtered to avoid exposing importing slot data ## Error handling * For any transient connection drops, the migration will be failed and require the user to retry. * If there is an OOM while reading from the import connection, we will fail the import, which will drop the importing slot data * If there is a client output buffer limit reached on the source node, it will drop the connection, which will cause the migration to fail * If at any point the export loses ownership or either node is failed over, a callback will be triggered on both ends of the migration to fail the import. The import will not reattempt with a new owner * The two ends of the migration are routinely pinging each other with SYNCSLOTS ACK messages. If at any point there is no interaction on the connection for longer than `repl-timeout`, the connection will be dropped, resulting in migration failure * If a failover happens, we will drop keys in all unowned slots. The migration does not persist through failovers and would need to be retried on the new source/target. ## State machine ``` Target/Importing Node State Machine ───────────────────────────────────────────────────────────── ┌────────────────────┐ │SLOT_IMPORT_WAIT_ACK┼──────┐ └──────────┬─────────┘ │ ACK│ │ ┌──────────────▼─────────────┐ │ │SLOT_IMPORT_RECEIVE_SNAPSHOT┼──┤ └──────────────┬─────────────┘ │ SNAPSHOT-EOF│ │ ┌───────────────▼──────────────┐ │ │SLOT_IMPORT_WAITING_FOR_PAUSED┼─┤ └───────────────┬──────────────┘ │ PAUSED│ │ ┌───────────────▼──────────────┐ │ Error Conditions: │SLOT_IMPORT_FAILOVER_REQUESTED┼─┤ 1. OOM └───────────────┬──────────────┘ │ 2. Slot Ownership Change FAILOVER-GRANTED│ │ 3. Demotion to replica ┌──────────────▼─────────────┐ │ 4. FLUSHDB │SLOT_IMPORT_FAILOVER_GRANTED┼──┤ 5. Connection Lost └──────────────┬─────────────┘ │ 6. No ACK from source (timeout) Takeover Performed│ │ ┌──────────────▼───────────┐ │ │SLOT_MIGRATION_JOB_SUCCESS┼────┤ └──────────────────────────┘ │ │ ┌─────────────────────────────────────▼─┐ │SLOT_IMPORT_FINISHED_WAITING_TO_CLEANUP│ └────────────────────┬──────────────────┘ Unowned Slots Cleaned Up│ ┌─────────────▼───────────┐ │SLOT_MIGRATION_JOB_FAILED│ └─────────────────────────┘ Source/Exporting Node State Machine ───────────────────────────────────────────────────────────── ┌──────────────────────┐ │SLOT_EXPORT_CONNECTING├─────────┐ └───────────┬──────────┘ │ Connected│ │ ┌─────────────▼────────────┐ │ │SLOT_EXPORT_AUTHENTICATING┼───────┤ └─────────────┬────────────┘ │ Authenticated│ │ ┌─────────────▼────────────┐ │ │SLOT_EXPORT_SEND_ESTABLISH┼───────┤ └─────────────┬────────────┘ │ ESTABLISH command written│ │ ┌─────────────────────▼─────────────┐ │ │SLOT_EXPORT_READ_ESTABLISH_RESPONSE┼──────┤ └─────────────────────┬─────────────┘ │ Full response read (+OK)│ │ ┌────────────────▼──────────────┐ │ Error Conditions: │SLOT_EXPORT_WAITING_TO_SNAPSHOT┼─────┤ 1. User sends CANCELMIGRATION └────────────────┬──────────────┘ │ 2. Slot ownership change No other child process│ │ 3. Demotion to replica ┌────────────▼───────────┐ │ 4. FLUSHDB │SLOT_EXPORT_SNAPSHOTTING┼────────┤ 5. Connection Lost └────────────┬───────────┘ │ 6. AUTH failed Snapshot done│ │ 7. ERR from ESTABLISH command ┌───────────▼─────────┐ │ 8. Unpaused before failover completed │SLOT_EXPORT_STREAMING┼──────────┤ 9. Snapshot failed (e.g. Child OOM) └───────────┬─────────┘ │ 10. No ack from target (timeout) PAUSE│ │ 11. Client output buffer overrun ┌──────────────▼─────────────┐ │ │SLOT_EXPORT_WAITING_TO_PAUSE┼──────┤ └──────────────┬─────────────┘ │ Buffer drained│ │ ┌──────────────▼────────────┐ │ │SLOT_EXPORT_FAILOVER_PAUSED┼───────┤ └──────────────┬────────────┘ │ Failover request granted│ │ ┌───────────────▼────────────┐ │ │SLOT_EXPORT_FAILOVER_GRANTED┼───────┤ └───────────────┬────────────┘ │ New topology received│ │ ┌──────────────▼───────────┐ │ │SLOT_MIGRATION_JOB_SUCCESS│ │ └──────────────────────────┘ │ │ ┌─────────────────────────┐ │ │SLOT_MIGRATION_JOB_FAILED│◄────────┤ └─────────────────────────┘ │ │ ┌────────────────────────────┐ │ │SLOT_MIGRATION_JOB_CANCELLED│◄──────┘ └────────────────────────────┘ ``` Co-authored-by: Binbin <binloveplay1314@qq.com> --------- Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Jacob Murphy <jkmurphy@google.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Ping Xie <pingxie@outlook.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
hpatro
pushed a commit
to hpatro/valkey
that referenced
this pull request
Oct 3, 2025
We now pass in rdbSnapshotOptions options in this function, and options.conns is now malloc'ed in the caller side, so we need to zfree it when returning early due to an error. Previously, conns was malloc'ed after the error handling, so we don't have this. Introduced in valkey-io#1949. --------- Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
hpatro
pushed a commit
to hpatro/valkey
that referenced
this pull request
Oct 3, 2025
This may result in meaningless slot migration job, we should return an error to user in advance to avoid operation error. Also `by myself` is not correct English grammar and `myself` is a internal code terminology, changed to `by this node`. Was introduced in valkey-io#1949. --------- Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
hpatro
pushed a commit
to hpatro/valkey
that referenced
this pull request
Oct 3, 2025
If all cluster nodes have functions, slot migration will fail since the target will return the function already exists error when doing the FUNCTION LOAD. And in addition, the target's replica could panic when it executes the FUNCTION LOAD propagated from the primary (see propagation-error-behavior). Introduced in valkey-io#1949. Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
hpatro
pushed a commit
to hpatro/valkey
that referenced
this pull request
Oct 3, 2025
…ous reading of auth response (valkey-io#2494) The old SLOT_EXPORT_AUTHENTICATING added in valkey-io#1949, when processed by the source node, we will send the AUTH command and then reads the response. If the target node is blocked during this process, the source node will also be blocked. We should use a read handler to handle this. We split SLOT_EXPORT_AUTHENTICATING into SLOT_EXPORT_SEND_AUTH and SLOT_EXPORT_READ_AUTH_RESPONSE to avoid this issue. Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
murphyjacob4
pushed a commit
that referenced
this pull request
Jan 10, 2026
…n and data corruption (#3004) When loading AOF in cluster mode, keys inside a MULTI/EXEC block could be inserted into wrong hash slots, causing key duplication and data corruption. The root cause was the slot caching optimization in getKeySlot(). This optimization reuses a cached slot value to avoid recalculating the hash for every key operation. However, when replaying AOF, a transaction may contain commands affecting keys in different slots. The cached slot from a previous command (e.g., SET k1) would incorrectly be used for subsequent commands in the transaction (e.g., SET k0), causing k0 to be stored in k1's slot. The existing code already skipped this optimization for replicated clients (commands from primary) using isReplicatedClient(). This change extends that to also skip for AOF clients by using mustObeyClient() instead, which covers both replicated clients and the AOF client. Fixes #2995, introduced in #1949. Signed-off-by: aditya.teltia <teltia.aditya22@gmail.com>
zuiderkwast
pushed a commit
to zuiderkwast/valkey
that referenced
this pull request
Jan 29, 2026
…n and data corruption (valkey-io#3004) When loading AOF in cluster mode, keys inside a MULTI/EXEC block could be inserted into wrong hash slots, causing key duplication and data corruption. The root cause was the slot caching optimization in getKeySlot(). This optimization reuses a cached slot value to avoid recalculating the hash for every key operation. However, when replaying AOF, a transaction may contain commands affecting keys in different slots. The cached slot from a previous command (e.g., SET k1) would incorrectly be used for subsequent commands in the transaction (e.g., SET k0), causing k0 to be stored in k1's slot. The existing code already skipped this optimization for replicated clients (commands from primary) using isReplicatedClient(). This change extends that to also skip for AOF clients by using mustObeyClient() instead, which covers both replicated clients and the AOF client. Fixes valkey-io#2995, introduced in valkey-io#1949. Signed-off-by: aditya.teltia <teltia.aditya22@gmail.com>
zuiderkwast
pushed a commit
to zuiderkwast/valkey
that referenced
this pull request
Jan 30, 2026
…n and data corruption (valkey-io#3004) When loading AOF in cluster mode, keys inside a MULTI/EXEC block could be inserted into wrong hash slots, causing key duplication and data corruption. The root cause was the slot caching optimization in getKeySlot(). This optimization reuses a cached slot value to avoid recalculating the hash for every key operation. However, when replaying AOF, a transaction may contain commands affecting keys in different slots. The cached slot from a previous command (e.g., SET k1) would incorrectly be used for subsequent commands in the transaction (e.g., SET k0), causing k0 to be stored in k1's slot. The existing code already skipped this optimization for replicated clients (commands from primary) using isReplicatedClient(). This change extends that to also skip for AOF clients by using mustObeyClient() instead, which covers both replicated clients and the AOF client. Fixes valkey-io#2995, introduced in valkey-io#1949. Signed-off-by: aditya.teltia <teltia.aditya22@gmail.com>
zuiderkwast
pushed a commit
that referenced
this pull request
Feb 3, 2026
…n and data corruption (#3004) When loading AOF in cluster mode, keys inside a MULTI/EXEC block could be inserted into wrong hash slots, causing key duplication and data corruption. The root cause was the slot caching optimization in getKeySlot(). This optimization reuses a cached slot value to avoid recalculating the hash for every key operation. However, when replaying AOF, a transaction may contain commands affecting keys in different slots. The cached slot from a previous command (e.g., SET k1) would incorrectly be used for subsequent commands in the transaction (e.g., SET k0), causing k0 to be stored in k1's slot. The existing code already skipped this optimization for replicated clients (commands from primary) using isReplicatedClient(). This change extends that to also skip for AOF clients by using mustObeyClient() instead, which covers both replicated clients and the AOF client. Fixes #2995, introduced in #1949. Signed-off-by: aditya.teltia <teltia.aditya22@gmail.com>
harrylin98
pushed a commit
to harrylin98/valkey_forked
that referenced
this pull request
Feb 19, 2026
…n and data corruption (valkey-io#3004) When loading AOF in cluster mode, keys inside a MULTI/EXEC block could be inserted into wrong hash slots, causing key duplication and data corruption. The root cause was the slot caching optimization in getKeySlot(). This optimization reuses a cached slot value to avoid recalculating the hash for every key operation. However, when replaying AOF, a transaction may contain commands affecting keys in different slots. The cached slot from a previous command (e.g., SET k1) would incorrectly be used for subsequent commands in the transaction (e.g., SET k0), causing k0 to be stored in k1's slot. The existing code already skipped this optimization for replicated clients (commands from primary) using isReplicatedClient(). This change extends that to also skip for AOF clients by using mustObeyClient() instead, which covers both replicated clients and the AOF client. Fixes valkey-io#2995, introduced in valkey-io#1949. Signed-off-by: aditya.teltia <teltia.aditya22@gmail.com>
hpatro
pushed a commit
to hpatro/valkey
that referenced
this pull request
Mar 5, 2026
…n and data corruption (valkey-io#3004) When loading AOF in cluster mode, keys inside a MULTI/EXEC block could be inserted into wrong hash slots, causing key duplication and data corruption. The root cause was the slot caching optimization in getKeySlot(). This optimization reuses a cached slot value to avoid recalculating the hash for every key operation. However, when replaying AOF, a transaction may contain commands affecting keys in different slots. The cached slot from a previous command (e.g., SET k1) would incorrectly be used for subsequent commands in the transaction (e.g., SET k0), causing k0 to be stored in k1's slot. The existing code already skipped this optimization for replicated clients (commands from primary) using isReplicatedClient(). This change extends that to also skip for AOF clients by using mustObeyClient() instead, which covers both replicated clients and the AOF client. Fixes valkey-io#2995, introduced in valkey-io#1949. Signed-off-by: aditya.teltia <teltia.aditya22@gmail.com> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
12 tasks
This was referenced Mar 26, 2026
enjoy-binbin
added a commit
to enjoy-binbin/valkey
that referenced
this pull request
Apr 4, 2026
In valkey.conf, slot-migration-max-failover-repl-bytes allows setting to -1 to disable the limit. ``` Setting this to -1 will disable this limit ``` But slot-migration-max-failover-repl-bytes is defined as MEMORY_CONFIG and memtoull() rejects negative inputs, making it impossible to set the value to -1 via config file or CONFIG SET. ``` >>> 'slot-migration-max-failover-repl-bytes "-1"' argument must be a memory value ``` Introduce SIGNED_MEMORY_CONFIG flag for memory configs that also accept plain negative number. When memtoull() fails and this flag is set, fall back to string2ll() for parsing. Use ll2string() for CONFIG GET and rewriteConfigNumericalOption() for CONFIG REWRITE when the value is negative. Add a serverAssert in initConfigValues() to enforce that PERCENT_CONFIG and SIGNED_MEMORY_CONFIG are never combined on the same config, since both use negative values with different semantics. This means we have had this issue since it was introduced in valkey-io#1949. Signed-off-by: Binbin <binloveplay1314@qq.com>
enjoy-binbin
added a commit
that referenced
this pull request
Apr 8, 2026
In valkey.conf, slot-migration-max-failover-repl-bytes allows setting to -1 to disable the limit. ``` Setting this to -1 will disable this limit ``` But slot-migration-max-failover-repl-bytes is defined as MEMORY_CONFIG and memtoull() rejects negative inputs, making it impossible to set the value to -1 via config file or CONFIG SET. ``` >>> 'slot-migration-max-failover-repl-bytes "-1"' argument must be a memory value ``` Introduce SIGNED_MEMORY_CONFIG flag for memory configs that also accept plain negative number. When memtoull() fails and this flag is set, fall back to string2ll() for parsing. Use ll2string() for CONFIG GET and rewriteConfigNumericalOption() for CONFIG REWRITE when the value is negative. Add a serverAssert in initConfigValues() to enforce that PERCENT_CONFIG and SIGNED_MEMORY_CONFIG are never combined on the same config, since both use negative values with different semantics. This means we have had this issue since it was introduced in #1949. Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
sarthakaggarwal97
pushed a commit
to sarthakaggarwal97/valkey
that referenced
this pull request
Apr 16, 2026
…y-io#3443) In valkey.conf, slot-migration-max-failover-repl-bytes allows setting to -1 to disable the limit. ``` Setting this to -1 will disable this limit ``` But slot-migration-max-failover-repl-bytes is defined as MEMORY_CONFIG and memtoull() rejects negative inputs, making it impossible to set the value to -1 via config file or CONFIG SET. ``` >>> 'slot-migration-max-failover-repl-bytes "-1"' argument must be a memory value ``` Introduce SIGNED_MEMORY_CONFIG flag for memory configs that also accept plain negative number. When memtoull() fails and this flag is set, fall back to string2ll() for parsing. Use ll2string() for CONFIG GET and rewriteConfigNumericalOption() for CONFIG REWRITE when the value is negative. Add a serverAssert in initConfigValues() to enforce that PERCENT_CONFIG and SIGNED_MEMORY_CONFIG are never combined on the same config, since both use negative values with different semantics. This means we have had this issue since it was introduced in valkey-io#1949. Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
madolson
pushed a commit
that referenced
this pull request
Apr 27, 2026
In valkey.conf, slot-migration-max-failover-repl-bytes allows setting to -1 to disable the limit. ``` Setting this to -1 will disable this limit ``` But slot-migration-max-failover-repl-bytes is defined as MEMORY_CONFIG and memtoull() rejects negative inputs, making it impossible to set the value to -1 via config file or CONFIG SET. ``` >>> 'slot-migration-max-failover-repl-bytes "-1"' argument must be a memory value ``` Introduce SIGNED_MEMORY_CONFIG flag for memory configs that also accept plain negative number. When memtoull() fails and this flag is set, fall back to string2ll() for parsing. Use ll2string() for CONFIG GET and rewriteConfigNumericalOption() for CONFIG REWRITE when the value is negative. Add a serverAssert in initConfigValues() to enforce that PERCENT_CONFIG and SIGNED_MEMORY_CONFIG are never combined on the same config, since both use negative values with different semantics. This means we have had this issue since it was introduced in #1949. Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
This was referenced May 3, 2026
[Shadow] Fix slot-migration-max-failover-repl-bytes unable to accept -1
sarthakaggarwal97/valkey#150
Closed
sarthakaggarwal97
pushed a commit
to sarthakaggarwal97/valkey
that referenced
this pull request
May 7, 2026
…y-io#3443) In valkey.conf, slot-migration-max-failover-repl-bytes allows setting to -1 to disable the limit. ``` Setting this to -1 will disable this limit ``` But slot-migration-max-failover-repl-bytes is defined as MEMORY_CONFIG and memtoull() rejects negative inputs, making it impossible to set the value to -1 via config file or CONFIG SET. ``` >>> 'slot-migration-max-failover-repl-bytes "-1"' argument must be a memory value ``` Introduce SIGNED_MEMORY_CONFIG flag for memory configs that also accept plain negative number. When memtoull() fails and this flag is set, fall back to string2ll() for parsing. Use ll2string() for CONFIG GET and rewriteConfigNumericalOption() for CONFIG REWRITE when the value is negative. Add a serverAssert in initConfigValues() to enforce that PERCENT_CONFIG and SIGNED_MEMORY_CONFIG are never combined on the same config, since both use negative values with different semantics. This means we have had this issue since it was introduced in valkey-io#1949. Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
12 tasks
valkeyrie-ops Bot
pushed a commit
that referenced
this pull request
May 18, 2026
In valkey.conf, slot-migration-max-failover-repl-bytes allows setting to -1 to disable the limit. ``` Setting this to -1 will disable this limit ``` But slot-migration-max-failover-repl-bytes is defined as MEMORY_CONFIG and memtoull() rejects negative inputs, making it impossible to set the value to -1 via config file or CONFIG SET. ``` >>> 'slot-migration-max-failover-repl-bytes "-1"' argument must be a memory value ``` Introduce SIGNED_MEMORY_CONFIG flag for memory configs that also accept plain negative number. When memtoull() fails and this flag is set, fall back to string2ll() for parsing. Use ll2string() for CONFIG GET and rewriteConfigNumericalOption() for CONFIG REWRITE when the value is negative. Add a serverAssert in initConfigValues() to enforce that PERCENT_CONFIG and SIGNED_MEMORY_CONFIG are never combined on the same config, since both use negative values with different semantics. This means we have had this issue since it was introduced in #1949. Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduces a new family of commands for migrating slots via replication. The procedure is driven by the source node which pushes an AOF formatted snapshot of the slots to the target, followed by a replication stream of changes on that slot (a la manual failover).
This solution is an adaptation of the solution provided by @enjoy-binbin, combined with the solution I previously posted at #1591, modified to meet the designs we had outlined in #23.
New commands
CLUSTER MIGRATESLOTS SLOTSRANGE start end [start end]... NODE node-id: Begin sending the slot via replication to the target. Multiple targets can be specified by repeatingSLOTSRANGE ... NODE ...CLUSTER CANCELMIGRATION ALL: Cancel all slot migrationsCLUSTER GETSLOTMIGRATIONS: See a recent log of migrationsThis PR only implements "one shot" semantics with an asynchronous model. Later, "two phase" (e.g. slot level replicate/failover commands) can be added with the same core.
Slot migration jobs
Introduces the concept of a slot migration job. While active, a job tracks a connection created by the source to the target over which the contents of the slots are sent. This connection is used for control messages as well as replicated slot data. Each job is given a 40 character random name to help uniquely identify it.
All jobs, including those that finished recently, can be observed using the
CLUSTER GETSLOTMIGRATIONScommand.Replication
CLUSTER SYNCSLOTSTo coordinate the state machine transitions across the two nodes, a new command is added,
CLUSTER SYNCSLOTS, that performs this control flow.Each end of the slot migration connection is expected to install a read handler in order to handle
CLUSTER SYNCSLOTScommands:ESTABLISH: Begins a slot migration. Provides slot migration information to the target and authorizes the connection to write to unowned slots.SNAPSHOT-EOF: appended to the end of the snapshot to signal that the snapshot is done being written to the target.PAUSE: informs the source node to pause whenever it gets the opportunityPAUSED: added to the end of the client output buffer when the pause is performed. The pause is only performed after the buffer shrinks below a configurable sizeREQUEST-FAILOVER: request the source to either grant or deny a failover for the slot migration. The grant is only granted if the target is still paused. Once a failover is granted, the paused is refreshed for a short durationFAILOVER-GRANTED: sent to the target to inform that REQUEST-FAILOVER is grantedACK: heartbeat command used to ensure livenessInteractions with other commands
Error handling
repl-timeout, the connection will be dropped, resulting in migration failureState machine
Closes #23.
Co-authored-by: Binbin binloveplay1314@qq.com