Rolling upgrade
Rolling upgrades, sometimes referred to as “node replacement upgrades,” can be performed on running clusters with virtually no downtime. Nodes are individually stopped and upgraded in place. Alternatively, nodes can be stopped and replaced, one at a time, by hosts running the new version. During this process, you can continue to index and query data in your cluster.
This document serves as a high-level, platform-agnostic overview of the rolling upgrade procedure. For specific examples of commands, scripts, and configuration files, refer to the Rolling upgrade lab.
Preparing to upgrade
Before making any changes to your OpenSearch cluster, is it highly recommended to back up your configuration files and create a snapshot of the cluster state and indexes.
Important: OpenSearch nodes cannot be downgraded. If you need to revert the upgrade, then you will need to perform a new installation of OpenSearch and restore the cluster from a snapshot. Take a snapshot and store it in a remote repository before beginning the upgrade procedure. Rolling upgrades are only supported between major adjacent versions, for example, from OpenSearch 1.x to 2.x but not 1.x to 3.x.
Performing the upgrade
- Verify the health of your OpenSearch cluster before you begin. You should resolve any index or shard allocation issues prior to upgrading to ensure that your data is preserved. A status of green indicates that all primary and replica shards are allocated. See Cluster health for more information. The following command queries the
_cluster/health
API endpoint:GET "/_cluster/health?pretty"
The response should look similar to the following example:
{ "cluster_name":"opensearch-dev-cluster", "status":"green", "timed_out":false, "number_of_nodes":4, "number_of_data_nodes":4, "active_primary_shards":1, "active_shards":4, "relocating_shards":0, "initializing_shards":0, "unassigned_shards":0, "delayed_unassigned_shards":0, "number_of_pending_tasks":0, "number_of_in_flight_fetch":0, "task_max_waiting_in_queue_millis":0, "active_shards_percent_as_number":100.0 }
- Disable shard replication to prevent shard replicas from being created while nodes are being taken offline. This stops the movement of Lucene index segments on nodes in your cluster. You can disable shard replication by querying the
_cluster/settings
API endpoint:PUT "/_cluster/settings?pretty" { "persistent": { "cluster.routing.allocation.enable": "primaries" } }
The response should look similar to the following example:
{ "acknowledged" : true, "persistent" : { "cluster" : { "routing" : { "allocation" : { "enable" : "primaries" } } } }, "transient" : { } }
- Perform a flush operation on the cluster to commit transaction log entries to the Lucene index:
POST "/_flush?pretty"
The response should look similar to the following example:
{ "_shards" : { "total" : 4, "successful" : 4, "failed" : 0 } }
-
Review your cluster and identify the first node to upgrade. The nodes should be upgraded in the following order:
- Data nodes
- Ingest/machine learning (ML)/coordinating nodes
- Cluster manager nodes
Eligible cluster manager nodes should be upgraded last because OpenSearch nodes can join a cluster with cluster manager nodes running an older version, but they cannot join a cluster with all cluster manager nodes running a newer version.
- Query the
_cat/nodes
endpoint to identify which node was promoted to cluster manager. The following command includes additional query parameters that request only the name, version, node.role, and master headers. Note that OpenSearch 1.x versions use the term “master,” which has been deprecated and replaced by “cluster_manager” in OpenSearch 2.x and later.GET "/_cat/nodes?v&h=name,version,node.role,master" | column -t
The response should look similar to the following example:
name version node.role master os-node-01 7.10.2 dimr - os-node-04 7.10.2 dimr - os-node-03 7.10.2 dimr - os-node-02 7.10.2 dimr *
- Stop the node you are upgrading. If running this in Docker, do not delete the volume associated with the container when you delete the container. The new OpenSearch container will use the existing volume. Deleting the volume will result in data loss.
- Confirm that the associated node has been dismissed from the cluster by querying the
_cat/nodes
API endpoint:GET "/_cat/nodes?v&h=name,version,node.role,master" | column -t
The response should look similar to the following example:
name version node.role master os-node-02 7.10.2 dimr * os-node-04 7.10.2 dimr - os-node-03 7.10.2 dimr -
os-node-01
is no longer listed because the container has been stopped and deleted. - Upgrade the node:
- If running in Docker, deploy a new container running the desired version of OpenSearch, mapped to the same volume as the container you deleted.
- If upgrading using Debian or RPM packages, install OpenSearch using
rpm
,yum
, ordpkg
and start the service. No further configuration is needed because locations and files are preserved. - If upgrading using Tarball, the following actions are required:
- Back up
jvm.options
,opensearch.yml
, certificates, and thedata
folder. - Extract the new tarball.
- Copy the previous
data
directory to the newdata
directory, otherwise data will be lost. - Copy the previous
opensearch.yml
file to the newconfig/opensearch.yml
file. - Copy the previous
jvm.options
file to the newconfig/jvm.options
file. - Copy the TLS certificates listed in the
opensearch.yml
file to the./config/
directory. - Start OpenSearch.
- Back up
- Query the
_cat/nodes
endpoint after OpenSearch is running on the new node to confirm that it has joined the cluster:GET "/_cat/nodes?v&h=name,version,node.role,master" | column -t
The response should look similar to the following example:
name version node.role master os-node-02 7.10.2 dimr * os-node-04 7.10.2 dimr - os-node-01 7.10.2 dimr - os-node-03 7.10.2 dimr -
In the example output, the new OpenSearch node reports a running version of
7.10.2
to the cluster. This is the result ofcompatibility.override_main_response_version
, which is used when connecting to a cluster with legacy clients that check for a version. You can manually confirm the version of the node by calling the/_nodes
API endpoint, as in the following command. Replace<nodeName>
with the name of your node. See Nodes API to learn more.GET "/_nodes/<nodeName>?pretty=true" | jq -r '.nodes | .[] | "\(.name) v\(.version)"'
The response should look similar to the following example:
os-node-01 v1.3.7
- Reenable shard replication:
PUT "/_cluster/settings?pretty" { "persistent": { "cluster.routing.allocation.enable": "all" } }
The response should look similar to the following example:
{ "acknowledged" : true, "persistent" : { "cluster" : { "routing" : { "allocation" : { "enable" : "all" } } } }, "transient" : { } }
- Confirm that the cluster is healthy:
GET "/_cluster/health?pretty"
The response should look similar to the following example:
{ "cluster_name" : "opensearch-dev-cluster", "status" : "green", "timed_out" : false, "number_of_nodes" : 4, "number_of_data_nodes" : 4, "discovered_master" : true, "active_primary_shards" : 1, "active_shards" : 4, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 }
- Repeat steps 2 through 11 for each node in your cluster. Remember to upgrade an eligible cluster manager node last. After replacing the last node, query the
_cat/nodes
endpoint to confirm that all nodes have joined the cluster. The cluster is now bootstrapped to the new version of OpenSearch. You can verify the cluster version by querying the_cat/nodes
API endpoint:GET "/_cat/nodes?v&h=name,version,node.role,master" | column -t
The response should look similar to the following example:
name version node.role master os-node-04 1.3.7 dimr - os-node-02 1.3.7 dimr * os-node-01 1.3.7 dimr - os-node-03 1.3.7 dimr -
- The upgrade is now complete, and you can begin enjoying the latest features and fixes!
Rolling restart
A rolling restart follows the same step-by-step procedure as a rolling upgrade, with the exception of upgrading of actual nodes. During a rolling restart, nodes are restarted one at a time—typically to apply configuration changes, refresh certificates, or perform system-level maintenance—without disrupting cluster availability.
To perform a rolling restart, follow the steps outlined in Performing the upgrade, excluding the steps that involve upgrading the OpenSearch binary or container image:
-
Check cluster health
Ensure the cluster status is green and all shards are assigned.
(See step 1 in the rolling upgrade procedure) -
Disable shard allocation
Prevent OpenSearch from trying to reallocate shards while nodes are offline.
(See step 2 in the rolling upgrade procedure) -
Flush transaction logs
Commit recent operations to Lucene to reduce recovery time.
(See step 3 in the rolling upgrade procedure) -
Review and identify the next node to restart
Ensure you restart the current cluster manager node last.
(See step 4 in the rolling upgrade procedure) -
Check which node is the current cluster manager
Use the_cat/nodes
API to determine which node is the current active cluster manager.
(See step 5 in the rolling upgrade procedure) -
Stop the node
Shut down the node gracefully. Do not delete the associated data volume.
(See step 6 in the rolling upgrade procedure) -
Confirm the node has left the cluster
Use_cat/nodes
to verify that it’s no longer listed.
(See step 7 in the rolling upgrade procedure) -
Restart the node
Start the same node (same binary/version/config) and let it rejoin the cluster.
(See step 8 in the rolling upgrade procedure — without upgrading the binary) -
Verify that the restarted node has rejoined
Check_cat/nodes
to confirm that the node is present and healthy.
(See step 9 in the rolling upgrade procedure) -
Reenable shard allocation
Restore full shard movement capability.
(See step 10 in the rolling upgrade procedure) -
Confirm cluster health is green
Validate stability before restarting the next node.
(See step 11 in the rolling upgrade procedure) -
Repeat the process for all other nodes
Restart each node one at a time. If a node is eligible for the cluster manager role, restart it last.
(See step 12 in the rolling upgrade procedure — again, no upgrade step)
By preserving quorum and restarting nodes sequentially, rolling restarts ensure zero downtime and full data continuity.
Related articles
- Rolling upgrade lab – A hands-on lab with step-by-step instructions for practicing rolling upgrades in a test environment.
- OpenSearch configuration
- Performance analyzer
- Install and configure OpenSearch Dashboards
- About Security in OpenSearch