Rolling upgrade

Rolling upgrades, sometimes referred to as “node replacement upgrades,” can be performed on running clusters with virtually no downtime. Nodes are individually stopped and upgraded in place. Alternatively, nodes can be stopped and replaced, one at a time, by hosts running the new version. During this process, you can continue to index and query data in your cluster.

This document serves as a high-level, platform-agnostic overview of the rolling upgrade procedure. For specific examples of commands, scripts, and configuration files, refer to the Rolling upgrade lab.

Preparing to upgrade

Before making any changes to your OpenSearch cluster, is it highly recommended to back up your configuration files and create a snapshot of the cluster state and indexes.

Important: OpenSearch nodes cannot be downgraded. If you need to revert the upgrade, then you will need to perform a new installation of OpenSearch and restore the cluster from a snapshot. Take a snapshot and store it in a remote repository before beginning the upgrade procedure. Rolling upgrades are only supported between major adjacent versions, for example, from OpenSearch 1.x to 2.x but not 1.x to 3.x.

Important: The minimum required cluster version for upgrades to 3.x.x is 2.19.0.

Performing the upgrade

Verify the health of your OpenSearch cluster before you begin. You should resolve any index or shard allocation issues prior to upgrading to ensure that your data is preserved. A status of green indicates that all primary and replica shards are allocated. See Cluster health for more information. The following command queries the _cluster/health API endpoint:

GET "/_cluster/health?pretty"

The response should look similar to the following example:

{
    "cluster_name":"opensearch-dev-cluster",
    "status":"green",
    "timed_out":false,
    "number_of_nodes":4,
    "number_of_data_nodes":4,
    "active_primary_shards":1,
    "active_shards":4,
    "relocating_shards":0,
    "initializing_shards":0,
    "unassigned_shards":0,
    "delayed_unassigned_shards":0,
    "number_of_pending_tasks":0,
    "number_of_in_flight_fetch":0,
    "task_max_waiting_in_queue_millis":0,
    "active_shards_percent_as_number":100.0
}

Disable shard replication to prevent shard replicas from being created while nodes are being taken offline. This stops the movement of Lucene index segments on nodes in your cluster. You can disable shard replication by querying the _cluster/settings API endpoint:

PUT "/_cluster/settings?pretty"
{
    "persistent": {
        "cluster.routing.allocation.enable": "primaries"
    }
}

The response should look similar to the following example:

{
  "acknowledged" : true,
  "persistent" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "enable" : "primaries"
        }
      }
    }
  },
  "transient" : { }
}

Perform a flush operation on the cluster to commit transaction log entries to the Lucene index:
```
POST "/_flush?pretty"
```
The response should look similar to the following example:
```
{
  "_shards" : {
    "total" : 4,
    "successful" : 4,
    "failed" : 0
  }
}
```
Review your cluster and identify the first node to upgrade. The nodes should be upgraded in the following order:
1. Data nodes
2. Ingest/machine learning (ML)/coordinating nodes
3. Cluster manager nodes
Eligible cluster manager nodes should be upgraded last because OpenSearch nodes can join a cluster with cluster manager nodes running an older version, but they cannot join a cluster with all cluster manager nodes running a newer version.
Query the _cat/nodes endpoint to identify which node was promoted to cluster manager. The following command includes additional query parameters that request only the name, version, node.role, and master headers. Note that OpenSearch 1.x versions use the term “master,” which has been deprecated and replaced by “cluster_manager” in OpenSearch 2.x and later.
```
GET "/_cat/nodes?v&h=name,version,node.role,master" | column -t
```
The response should look similar to the following example:
```
name        version  node.role  master
os-node-01  7.10.2   dimr       -
os-node-04  7.10.2   dimr       -
os-node-03  7.10.2   dimr       -
os-node-02  7.10.2   dimr       *
```
Stop the node you are upgrading. If running this in Docker, do not delete the volume associated with the container when you delete the container. The new OpenSearch container will use the existing volume. Deleting the volume will result in data loss.
Confirm that the associated node has been dismissed from the cluster by querying the _cat/nodes API endpoint:
```
GET "/_cat/nodes?v&h=name,version,node.role,master" | column -t
```
The response should look similar to the following example:
```
name        version  node.role  master
os-node-02  7.10.2   dimr       *
os-node-04  7.10.2   dimr       -
os-node-03  7.10.2   dimr       -
```
os-node-01 is no longer listed because the container has been stopped and deleted.
Upgrade the node:
- If running in Docker, deploy a new container running the desired version of OpenSearch, mapped to the same volume as the container you deleted.
- If upgrading using Debian or RPM packages, install OpenSearch using rpm, yum, or dpkg and start the service. No further configuration is needed because locations and files are preserved.
- If upgrading using Tarball, the following actions are required:
  - Back up jvm.options, opensearch.yml, certificates, and the data folder.
  - Extract the new tarball.
  - Copy the previous data directory to the new data directory, otherwise data will be lost.
  - Copy the previous opensearch.yml file to the new config/opensearch.yml file.
  - Copy the previous jvm.options file to the new config/jvm.options file.
  - Copy the TLS certificates listed in the opensearch.yml file to the ./config/ directory.
  - Start OpenSearch.
Query the _cat/nodes endpoint after OpenSearch is running on the new node to confirm that it has joined the cluster:
```
GET "/_cat/nodes?v&h=name,version,node.role,master" | column -t
```
The response should look similar to the following example:
```
name        version  node.role  master
os-node-02  7.10.2   dimr       *
os-node-04  7.10.2   dimr       -
os-node-01  7.10.2   dimr       -
os-node-03  7.10.2   dimr       -
```
In the example output, the new OpenSearch node reports a running version of 7.10.2 to the cluster. This is the result of compatibility.override_main_response_version, which is used when connecting to a cluster with legacy clients that check for a version. You can manually confirm the version of the node by calling the /_nodes API endpoint, as in the following command. Replace <nodeName> with the name of your node. See Nodes API to learn more.
```
GET "/_nodes/<nodeName>?pretty=true" | jq -r '.nodes | .[] | "\(.name) v\(.version)"'
```
The response should look similar to the following example:
```
os-node-01 v1.3.7
```

Reenable shard replication:

PUT "/_cluster/settings?pretty"
{
    "persistent": {
        "cluster.routing.allocation.enable": "all"
    }
}

The response should look similar to the following example:

{
  "acknowledged" : true,
  "persistent" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "enable" : "all"
        }
      }
    }
  },
  "transient" : { }
}

Confirm that the cluster is healthy:

GET "/_cluster/health?pretty"

The response should look similar to the following example:

{
  "cluster_name" : "opensearch-dev-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 4,
  "number_of_data_nodes" : 4,
  "discovered_master" : true,
  "active_primary_shards" : 1,
  "active_shards" : 4,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Repeat steps 2 through 11 for each node in your cluster. Remember to upgrade an eligible cluster manager node last. After replacing the last node, query the _cat/nodes endpoint to confirm that all nodes have joined the cluster. The cluster is now bootstrapped to the new version of OpenSearch. You can verify the cluster version by querying the _cat/nodes API endpoint:
```
GET "/_cat/nodes?v&h=name,version,node.role,master" | column -t
```
The response should look similar to the following example:
```
name        version  node.role  master
os-node-04  1.3.7    dimr       -
os-node-02  1.3.7    dimr       *
os-node-01  1.3.7    dimr       -
os-node-03  1.3.7    dimr       -
```
The upgrade is now complete, and you can begin enjoying the latest features and fixes!

Rolling restart

A rolling restart follows the same step-by-step procedure as a rolling upgrade, with the exception of upgrading of actual nodes. During a rolling restart, nodes are restarted one at a time—typically to apply configuration changes, refresh certificates, or perform system-level maintenance—without disrupting cluster availability.

To perform a rolling restart, follow the steps outlined in Performing the upgrade, excluding the steps that involve upgrading the OpenSearch binary or container image:

Check cluster health
Ensure the cluster status is green and all shards are assigned.
(See step 1 in the rolling upgrade procedure)
Disable shard allocation
Prevent OpenSearch from trying to reallocate shards while nodes are offline.
(See step 2 in the rolling upgrade procedure)
Flush transaction logs
Commit recent operations to Lucene to reduce recovery time.
(See step 3 in the rolling upgrade procedure)
Review and identify the next node to restart
Ensure you restart the current cluster manager node last.
(See step 4 in the rolling upgrade procedure)
Check which node is the current cluster manager
Use the _cat/nodes API to determine which node is the current active cluster manager.
(See step 5 in the rolling upgrade procedure)
Stop the node
Shut down the node gracefully. Do not delete the associated data volume.
(See step 6 in the rolling upgrade procedure)
Confirm the node has left the cluster
Use _cat/nodes to verify that it’s no longer listed.
(See step 7 in the rolling upgrade procedure)
Restart the node
Start the same node (same binary/version/config) and let it rejoin the cluster.
(See step 8 in the rolling upgrade procedure — without upgrading the binary)
Verify that the restarted node has rejoined
Check _cat/nodes to confirm that the node is present and healthy.
(See step 9 in the rolling upgrade procedure)
Reenable shard allocation
Restore full shard movement capability.
(See step 10 in the rolling upgrade procedure)
Confirm cluster health is green
Validate stability before restarting the next node.
(See step 11 in the rolling upgrade procedure)
Repeat the process for all other nodes
Restart each node one at a time. If a node is eligible for the cluster manager role, restart it last.
(See step 12 in the rolling upgrade procedure — again, no upgrade step)

By preserving quorum and restarting nodes sequentially, rolling restarts ensure zero downtime and full data continuity.

Rolling upgrade lab – A hands-on lab with step-by-step instructions for practicing rolling upgrades in a test environment.
OpenSearch configuration
Performance analyzer
Install and configure OpenSearch Dashboards
About Security in OpenSearch

Preparing to upgrade
Performing the upgrade
Rolling restart
Related documentation

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

Rolling upgrade

Preparing to upgrade

Performing the upgrade

Rolling restart

OpenSearch Links

Get Involved

Resources

Contact Us

Rolling upgrade

Preparing to upgrade

Performing the upgrade

Rolling restart

Related documentation