Cluster reroute
The /_cluster/reroute
API allows you to manually control the allocation of individual shards within the cluster. This includes moving, allocating, or canceling shard allocations. It’s typically used for advanced scenarios, such as manual recovery or custom load balancing.
Shard movement is subject to cluster allocation deciders. Always test reroute commands using dry_run=true
before applying them in production environments. Use the explain=true
parameter to obtain detailed insight into allocation decisions, which can assist in understanding why a particular reroute request may or may not be allowed. If shard allocation fails because of prior issues or cluster instability, you can reattempt allocation using the retry_failed=true
parameter.
For more information regarding shard distribution and cluster health, see Cluster health and Cluster allocation explain.
Endpoints
POST /_cluster/reroute
Query parameters
Parameter | Data type | Description |
---|---|---|
dry_run | Boolean | If true , validates and simulates the reroute request without applying it. Default is false . |
explain | Boolean | If true , returns an explanation of why the command was accepted or rejected. Default is false . |
retry_failed | Boolean | If true , retries allocation of shards that previously failed. Default is false . |
metric | String | Limits the returned metadata. See Metric options for a list of available options. Default is _all . |
cluster_manager_timeout | Time | The timeout for connection to the cluster manager node. Default is 30s . |
timeout | Time | The overall request timeout. Default is 30s . |
Metric options
The metric
parameter filters the cluster state values returned by the Reroute API. This is useful for reducing response size or inspecting specific parts of the cluster state. This parameter supports the following values:
_all
(Default): Returns all available cluster state sections.blocks
: Includes information about read- and write-level blocks in the cluster.cluster_manager_node
: Shows which node is currently acting as the cluster manager.metadata
: Returns index settings, mappings, and aliases. If specific indexes are targeted, only their metadata is returned.nodes
: Includes all nodes in the cluster and their metadata.routing_table
: Returns the routing information for all shards and replicas.version
: Displays the cluster state version number.
You can combine values in a comma-separated list, such as metric=metadata,nodes,routing_table
.
Request body fields
The commands
array in the request body defines actions to apply to shard allocation. It supports the following actions.
Move
The move
command moves a started shard (primary or replica) from one node to another. This can be used to balance load or drain a node before maintenance. The shard must be in the STARTED
state. Both primary and replica shards can be moved using this command.
The move
command requires the following parameters:
index
: The name of the index.shard
: The shard number.from_node
: The name of the node to move the shard from.to_node
: The name of the node to move the shard to.
Cancel
The cancel
command cancels allocation of a shard (including recovery). This command forces resynchronization by canceling existing allocations and letting the system reinitialize them. Replica shard allocations can be canceled by default, but canceling a primary shard requires allow_primary=true
in order to prevent accidental data disruption.
The cancel
command requires the following parameters:
index
: The name of the index.shard
: The shard number.node
: The name or node ID of the node to perform the action on.allow_primary
(Optional): Iftrue
, allows cancellation of primary shard allocations. Default isfalse
.
Allocate replica
The allocate_replica
command assigns an unassigned replica to a specified node. This operation respects allocation deciders. Use this command to manually trigger allocation of replicas when automatic allocation fails.
The allocate_replica
command requires the following parameters:
index
: The name of the index.shard
: The shard number.node
: The name or node ID of the node to perform the action on.
Allocate stale primary
The allocate_stale_primary
command force-allocates a primary shard to a node that holds a stale copy.
This command should be used with extreme caution. It bypasses safety checks and may lead to data loss, especially if a more recent shard copy exists on another node that is temporarily offline. If that node rejoins the cluster later, its data will be deleted or replaced by the stale copy that was forcefully promoted.
Use this command only when no up-to-date copies are available and you have no way to restore the original data.
The allocate_stale_primary
command requires the following parameters:
index
: The name of the index.shard
: The shard number.node
: The name or node ID of the node to perform the action on.accept_data_loss
: Must be set totrue
.
Allocate empty primary
The allocate_empty_primary
command force-allocates a new empty primary shard to a node. This operation initializes a new primary shard without any existing data.
Any previous data for the shard will be permanently lost. If a node with valid data for that shard later rejoins the cluster, its copy will be erased. This command is intended for disaster recovery when no valid shard copies exist and recovery from backup or a snapshot is not possible.
The allocate_empty_primary
command requires the following parameters:
index
: The name of the index.shard
: The shard number.node
: The name or node ID of the node to perform the action on.accept_data_loss
: Must be set totrue
.
Example
The following are examples of using the Cluster Reroute API.
Moving a shard
Create a sample index:
PUT /test-cluster-index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}
Run the following reroute command to move shard 0
of the index test-cluster-index
from node node1
to node node2
:
POST /_cluster/reroute
{
"commands": [
{
"move": {
"index": "test-cluster-index",
"shard": 0,
"from_node": "node1",
"to_node": "node2"
}
}
]
}
Simulating a reroute
To simulate a reroute without executing it, set dry_run=true
:
POST /_cluster/reroute?dry_run=true
{
"commands": [
{
"move": {
"index": "test-cluster-index",
"shard": 0,
"from_node": "node1",
"to_node": "node2"
}
}
]
}
Retrying failed allocations
If some shards failed to allocate because of previous issues, you can reattempt allocation:
POST /_cluster/reroute?retry_failed=true
Explaining reroute decisions
To understand why a reroute command is accepted or rejected, add explain=true
:
POST /_cluster/reroute?explain=true
{
"commands": [
{
"move": {
"index": "test-cluster-index",
"shard": 0,
"from_node": "node1",
"to_node": "node3"
}
}
]
}
This returns a decisions
array explaining the outcome:
"decisions": [
{
"decider": "max_retry",
"decision": "YES",
"explanation": "shard has no previous failures"
},
{
"decider": "replica_after_primary_active",
"decision": "YES",
"explanation": "shard is primary and can be allocated"
},
...
{
"decider": "remote_store_migration",
"decision": "YES",
"explanation": "[none migration_direction]: primary shard copy can be relocated to a non-remote node for strict compatibility mode"
}
]
Response body fields
The response includes cluster state metadata and, optionally, a decisions
array if explain=true
was used.
Field | Data type | Description |
---|---|---|
acknowledged | Boolean | States whether the reroute request was acknowledged. |
state.cluster_uuid | String | The unique identifier of the cluster. |
state.version | Integer | The version of the cluster state. |
state.state_uuid | String | The UUID for this specific state version. |
state.master_node | String | As with cluster_manager_node , this is maintained for backward compatibility. |
state.cluster_manager_node | String | The ID of the elected cluster manager node. |
state.blocks | Object | Any global or index-level cluster blocks. |
state.nodes | Object | The cluster node’s metadata, including its name and address. |
state.routing_table | Object | The shard routing information for each index. |
state.routing_nodes | Object | The shard allocation organized by node. |
commands | List | A list of processed reroute commands. |
explanations | List | If explain=true , includes detailed explanations of the outcomes. |