Update By Query API
Introduced 1.0
The Update by Query API updates all documents in an index that match a specified query. You can update documents without changing their source to pick up mapping changes, or use a script to modify field values based on custom logic.
Use this API in the following scenarios:
- Applying mapping changes to existing documents after adding new fields or changing field types.
- Updating field values across multiple documents based on calculated logic or conditions.
- Incrementing counters or performing bulk calculations on documents that match specific criteria.
- Conditionally deleting documents by setting
ctx.op = "delete"in a script. - Performing no-operation updates by setting
ctx.op = "noop"when conditions aren’t met.
When you submit an update by query request, OpenSearch takes a snapshot of the index at the start of the operation and updates matching documents using internal versioning. If a document changes between when the snapshot is taken and when the update operation processes it, a version conflict occurs and the update fails for that document unless you set the conflicts parameter to proceed. When a version conflict doesn’t cause an abort, the document is updated and its version number is incremented. Successfully updated documents are not rolled back even if later operations in the batch fail.
All update and query failures cause the operation to abort and are returned in the failures array of the response. Successful updates persist even after an abort. While the first failure triggers the abort, all failures from the rejected bulk request appear in the failures element, so multiple failed entities may be reported.
OpenSearch retries rejected search or bulk requests up to 10 times with exponential backoff. If the maximum retry limit is reached, the operation halts and returns all failed requests in the response.
Note: OpenSearch cannot update documents with version 0 using this API. The internal versioning system requires that version numbers are greater than 0 in order to track and process update operations correctly.
Endpoints
POST /{index}/_update_by_query
Path parameters
The following table lists the available path parameters.
| Parameter | Required | Data type | Description |
|---|---|---|---|
index | Required | List or String | A comma-separated list of data streams, indexes, and aliases to search. Supports wildcards (*). To search all data streams or indexes, omit this parameter or use * or _all. |
Query parameters
The following table lists the available query parameters. All query parameters are optional.
| Parameter | Data type | Description | Default |
|---|---|---|---|
_source | Boolean or List or String | Set to true or false to return the _source field or not, or a list of fields to return. | N/A |
_source_excludes | List | List of fields to exclude from the returned _source field. | N/A |
_source_includes | List | List of fields to extract and return from the _source field. | N/A |
allow_no_indices | Boolean | If false, the request returns an error if any wildcard expression, index alias, or _all value targets only missing or closed indexes. This behavior applies even if the request targets other open indexes. For example, a request targeting foo*,bar* returns an error if an index starts with foo but no index starts with bar. | N/A |
analyze_wildcard | Boolean | If true, wildcard and prefix queries are analyzed. | false |
analyzer | String | Analyzer to use for the query string. | N/A |
conflicts | String | What to do if update by query hits version conflicts: abort or proceed. Valid values are: - abort: Abort the operation on version conflicts. - proceed: Proceed with the operation on version conflicts. | N/A |
default_operator | String | The default operator for query string query: AND or OR. Valid values are: and, AND, or, and OR. | N/A |
df | String | Field to use as default where no field prefix is given in the query string. | N/A |
expand_wildcards | List or String | Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values, such as open,hidden. Valid values are: all, open, closed, hidden, none. Valid values are: - all: Match any index, including hidden ones. - closed: Match closed, non-hidden indexes. - hidden: Match hidden indexes. Must be combined with open, closed, or both. - none: Wildcard expressions are not accepted. - open: Match open, non-hidden indexes. | N/A |
from | Integer | Starting offset. | 0 |
ignore_unavailable | Boolean | If false, the request returns an error if it targets a missing or closed index. | N/A |
lenient | Boolean | If true, format-based query failures (such as providing text to a numeric field) in the query string will be ignored. | N/A |
max_docs | Integer | Maximum number of documents to process. Defaults to all documents. | N/A |
pipeline | String | ID of the pipeline to use to preprocess incoming documents. If the index has a default ingest pipeline specified, then setting the value to _none disables the default ingest pipeline for this request. If a final pipeline is configured it will always run, regardless of the value of this parameter. | N/A |
preference | String | Specifies the node or shard the operation should be performed on. Random by default. | random |
q | String | Query in the Lucene query string syntax. | N/A |
refresh | Boolean or String | If true, OpenSearch refreshes affected shards to make the operation visible to search. Valid values are: - false: Do not refresh the affected shards. - true: Refresh the affected shards immediately. - wait_for: Wait for the changes to become visible before replying. | N/A |
request_cache | Boolean | If true, the request cache is used for this request. | N/A |
requests_per_second | Float | The throttle for this request in sub-requests per second. | 0 |
routing | List or String | A custom value used to route operations to a specific shard. | N/A |
scroll | String | Period to retain the search context for scrolling. | N/A |
scroll_size | Integer | Size of the scroll request that powers the operation. | 100 |
search_timeout | String | Explicit timeout for each search request. | N/A |
search_type | String | The type of the search operation. Available options: query_then_fetch, dfs_query_then_fetch. Valid values are: - dfs_query_then_fetch: Documents are scored using global term and document frequencies across all shards. This is usually slower but more accurate. - query_then_fetch: Documents are scored using local term and document frequencies for the shard. This is usually faster but less accurate. | N/A |
size | Integer | Deprecated, use max_docs instead. | N/A |
slices | Integer or String | The number of slices this task should be divided into. Valid values are: - auto: Automatically determine the number of slices. | N/A |
sort | List | A comma-separated list of | N/A |
stats | List | Specific tag of the request for logging and statistical purposes. | N/A |
terminate_after | Integer | Maximum number of documents to collect for each shard. If a query reaches this limit, OpenSearch terminates the query early. OpenSearch collects documents before sorting. Use with caution. OpenSearch applies this parameter to each shard handling the request. When possible, let OpenSearch perform early termination automatically. Avoid specifying this parameter for requests that target data streams with backing indexes across multiple data tiers. | N/A |
timeout | String | Period each update request waits for the following operations: dynamic mapping updates, waiting for active shards. | N/A |
version | Boolean | If true, returns the document version as part of a hit. | N/A |
wait_for_active_shards | Integer or String or NULL or String | The number of shard copies that must be active before proceeding with the operation. Set to all or any positive integer up to the total number of shards in the index (number_of_replicas+1). Valid values are: - all: Wait for all shards to be active. | N/A |
wait_for_completion | Boolean | If true, the request blocks until the operation is complete. | true |
Important: When using _source, _source_includes, or _source_excludes in an update by query request, these settings affect not only the response but also the fields available to the update script. If a field is excluded from _source and not explicitly handled in the script, it may be removed from the document during the update operation. To preserve excluded fields, ensure that the script reads and reassigns them as needed.
Request body fields
The request body is optional but typically includes a query to specify which documents to update and a script to define the update logic.
| Field | Data type | Description |
|---|---|---|
query | Object | The query used to select documents for update. If not specified, the operation updates all documents in the target index. For more information about query types, see Query DSL. |
script | Object | The script to run on each matching document. Contains source (the script code), lang (script language, typically painless), and optional params (parameters passed to the script). The script can access the document via ctx._source and control the operation by setting ctx.op. |
slice | Object | Manually specify slice ID and maximum slices for parallel processing. Contains id (integer, slice number) and max (integer, total number of slices). Optional. |
max_docs | Integer | Maximum number of documents to process. Optional. |
conflicts | String | What to do when the update by query operation encounters version conflicts. Set to proceed to continue or abort to stop. Can be specified in either the request body or as a query parameter. Optional. |
Script operations
Within your update script, you can control what happens to each document by setting ctx.op:
| Operation | Description |
|---|---|
No operation (noop) | Set ctx.op = "noop" to skip updating a document when your script determines no changes are needed. OpenSearch reports skipped documents in the noops counter of the response. |
Delete (delete) | Set ctx.op = "delete" to delete a document based on script logic. OpenSearch reports deleted documents in the deleted counter of the response. |
Setting ctx.op to any other value causes an error. Modifying other fields in ctx besides ctx._source and ctx.op also causes an error.
Refreshing shards
Specifying the refresh parameter refreshes all shards involved in the update by query operation after the request completes. This behavior differs from the Update API’s refresh parameter, which only refreshes the shard that received the update request. The Update by Query API does not support the wait_for value for the refresh parameter.
Running update by query asynchronously
To run an update by query operation asynchronously, set the wait_for_completion query parameter to false. OpenSearch performs preflight checks, launches the request, and returns a task ID that you can use to monitor progress or cancel the operation. When running asynchronously, OpenSearch creates a record of the task as a document at .tasks/task/${taskId}. After the task completes, delete the task document to allow OpenSearch to reclaim the space.
Waiting for active shards
The wait_for_active_shards parameter controls how many shard copies must be active before processing the request. The timeout parameter controls how long each write request waits for unavailable shards to become available. These parameters work the same way as in the Bulk API. Because Update by Query uses scrolled searches, you can specify the scroll parameter to control how long the search context remains active. The default scroll time is 5 minutes.
Throttling update requests
To control the rate at which update by query issues batches of update operations, set requests_per_second to any positive decimal number. This pads each batch with a wait time to throttle the rate. Set requests_per_second to -1 to disable throttling.
Throttling uses wait time between batches so that internal scroll requests can be given a timeout that accounts for request padding. The padding time is the difference between the batch size divided by requests_per_second and the time spent writing. By default, the batch size is 1,000, so if requests_per_second is set to 500:
target_time = 1,000 / 500 per second = 2 seconds
wait_time = target_time - write_time = 2 seconds - 0.5 seconds = 1.5 seconds
Because each batch is issued as a single bulk request, large batch sizes cause OpenSearch to create many requests and then wait before starting the next batch. This creates uneven processing patterns with periods of high activity followed by idle waiting.
Slicing for parallel processing
You can use slicing to run update operations in parallel across multiple threads. This approach divides the update operation into independent segments, improving performance for large-scale updates.
Setting slices to auto allows OpenSearch to choose a reasonable number for most indexes. When using automatic slicing or tuning it manually, consider these factors:
- Optimal query performance occurs when you match the slice count to your shard count. However, for indexes with many shards (500 or more), use fewer slices to avoid performance degradation from excessive parallelization overhead. Setting slices higher than the number of shards generally does not improve efficiency and adds overhead.
- Update performance scales linearly across available resources with the number of slices.
- Whether query or update performance dominates runtime depends on the documents being updated and available cluster resources.
Example: Updating all documents without changing source
The following example request updates all documents in the index without modifying their source. This is useful for picking up new mapping properties or other mapping changes:
POST /products/_update_by_query?conflicts=proceedresponse = client.update_by_query(
index = "products",
params = { "conflicts": "proceed" },
body = { "Insert body here" }
)Example: Updating documents with a query filter
The following example request updates only electronics products by adding a 10% discount:
POST /products/_update_by_query
{
"query": {
"term": {
"category": "electronics"
}
},
"script": {
"source": "ctx._source.discount = params.discountPercent",
"lang": "painless",
"params": {
"discountPercent": 0.1
}
}
}response = client.update_by_query(
index = "products",
body = {
"query": {
"term": {
"category": "electronics"
}
},
"script": {
"source": "ctx._source.discount = params.discountPercent",
"lang": "painless",
"params": {
"discountPercent": 0.1
}
}
}
)Example: Incrementing a field value
The following example request increments the likes counter for all products from a specific user:
POST /products/_update_by_query
{
"query": {
"term": {
"user_id": "user1"
}
},
"script": {
"source": "ctx._source.likes++",
"lang": "painless"
}
}response = client.update_by_query(
index = "products",
body = {
"query": {
"term": {
"user_id": "user1"
}
},
"script": {
"source": "ctx._source.likes++",
"lang": "painless"
}
}
)Example: Conditionally deleting documents
The following example request deletes out-of-stock products with zero likes:
POST /products/_update_by_query
{
"query": {
"bool": {
"must": [
{
"term": {
"in_stock": false
}
},
{
"term": {
"likes": 0
}
}
]
}
},
"script": {
"source": "ctx.op = 'delete'",
"lang": "painless"
}
}response = client.update_by_query(
index = "products",
body = {
"query": {
"bool": {
"must": [
{
"term": {
"in_stock": false
}
},
{
"term": {
"likes": 0
}
}
]
}
},
"script": {
"source": "ctx.op = 'delete'",
"lang": "painless"
}
}
)Example: Using noop for conditional updates
The following example request increases discount only for products priced above $100, otherwise performs no operation:
POST /products/_update_by_query
{
"script": {
"source": "if (ctx._source.price > 100) { ctx._source.discount = 0.15 } else { ctx.op = 'noop' }",
"lang": "painless"
}
}response = client.update_by_query(
index = "products",
body = {
"script": {
"source": "if (ctx._source.price > 100) { ctx._source.discount = 0.15 } else { ctx.op = 'noop' }",
"lang": "painless"
}
}
)Example: Updating from multiple indexes
The following example request updates documents across multiple indexes:
POST /products,inventory/_update_by_query
{
"query": {
"match_all": {}
}
}response = client.update_by_query(
index = "products,inventory",
body = {
"query": {
"match_all": {}
}
}
)Example: Using routing for targeted updates
The following example request limits the update operation to shards with a specific routing value:
POST /products/_update_by_query?routing=user1
{
"query": {
"term": {
"user_id": "user1"
}
},
"script": {
"source": "ctx._source.likes += 5",
"lang": "painless"
}
}response = client.update_by_query(
index = "products",
params = { "routing": "user1" },
body = {
"query": {
"term": {
"user_id": "user1"
}
},
"script": {
"source": "ctx._source.likes += 5",
"lang": "painless"
}
}
)Example: Using scroll_size to control batch size
The following example request uses a custom scroll batch size of 100 documents:
POST /products/_update_by_query?scroll_size=100
{
"query": {
"range": {
"price": {
"gte": 50
}
}
},
"script": {
"source": "ctx._source.discount = 0.05",
"lang": "painless"
}
}response = client.update_by_query(
index = "products",
params = { "scroll_size": "100" },
body = {
"query": {
"range": {
"price": {
"gte": 50
}
}
},
"script": {
"source": "ctx._source.discount = 0.05",
"lang": "painless"
}
}
)Example: Manual slicing for parallel processing
The following example requests manually divide the update operation into two slices for parallel processing:
POST /products/_update_by_query
{
"slice": {
"id": 0,
"max": 2
},
"script": {
"source": "ctx._source.discount = 0.20",
"lang": "painless"
}
}response = client.update_by_query(
index = "products",
body = {
"slice": {
"id": 0,
"max": 2
},
"script": {
"source": "ctx._source.discount = 0.20",
"lang": "painless"
}
}
)In a separate request, process the second slice:
POST /products/_update_by_query
{
"slice": {
"id": 1,
"max": 2
},
"script": {
"source": "ctx._source.discount = 0.20",
"lang": "painless"
}
}response = client.update_by_query(
index = "products",
body = {
"slice": {
"id": 1,
"max": 2
},
"script": {
"source": "ctx._source.discount = 0.20",
"lang": "painless"
}
}
)Example: Automatic slicing
The following example request uses automatic slicing to parallelize the update operation across 5 slices:
POST /products/_update_by_query?slices=5&refresh=true
{
"script": {
"source": "ctx._source.discount = 0.25",
"lang": "painless"
}
}response = client.update_by_query(
index = "products",
params = { "slices": "5", "refresh": "true" },
body = {
"script": {
"source": "ctx._source.discount = 0.25",
"lang": "painless"
}
}
)To allow OpenSearch to automatically determine the optimal number of slices, use slices=auto:
POST /products/_update_by_query?slices=auto
{
"query": {
"term": {
"category": "furniture"
}
},
"script": {
"source": "ctx._source.discount = 0.30",
"lang": "painless"
}
}response = client.update_by_query(
index = "products",
params = { "slices": "auto" },
body = {
"query": {
"term": {
"category": "furniture"
}
},
"script": {
"source": "ctx._source.discount = 0.30",
"lang": "painless"
}
}
)Example response
The following example response shows a successful update by query operation that updated 8 documents:
{
"took": 39,
"timed_out": false,
"total": 8,
"updated": 8,
"deleted": 0,
"batches": 1,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1.0,
"throttled_until_millis": 0,
"failures": []
}
When using a script with conditional noop operations, the response includes a noops count showing how many documents were skipped:
{
"took": 55,
"timed_out": false,
"total": 8,
"updated": 4,
"deleted": 0,
"batches": 1,
"version_conflicts": 0,
"noops": 4,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1.0,
"throttled_until_millis": 0,
"failures": []
}
When using manual slicing, the response includes a slice_id field indicating which slice was processed:
{
"took": 12,
"timed_out": false,
"slice_id": 0,
"total": 4,
"updated": 4,
"deleted": 0,
"batches": 1,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1.0,
"throttled_until_millis": 0,
"failures": []
}
Response body fields
The following table lists all response body fields.
| Field | Data type | Description |
|---|---|---|
took | Integer | The amount of time from the start to the end of the entire operation, in milliseconds. |
timed_out | Boolean | Whether any of the requests executed during the update by query operation timed out. When set to true, successfully completed updates still persist and are not rolled back. |
total | Integer | The total number of documents that were successfully processed. |
updated | Integer | The number of documents that were successfully updated. |
deleted | Integer | The number of documents that were deleted. This occurs when the script sets ctx.op = "delete". |
batches | Integer | The number of scroll batches processed by the update by query operation. |
version_conflicts | Integer | The number of version conflicts encountered by the update by query operation. Occurs when a document changes between the time the snapshot is taken and when the update operation is processed. |
noops | Integer | The number of documents that were ignored because the script set ctx.op = "noop". Unlike delete by query, this field can contain non-zero values when scripts conditionally skip updates. |
retries | Object | The number of retries attempted by the update by query operation. Contains bulk (number of bulk action retries) and search (number of search action retries). |
throttled_millis | Integer | The amount of time the request was throttled to conform to requests_per_second, in milliseconds. |
requests_per_second | Float | The number of requests per second effectively executed during the update by query operation. |
throttled_until_millis | Integer | The amount of time until the next throttled request will be executed, in milliseconds. Always equals 0 in a completed update by query response. This field has meaning only when using the Tasks API to monitor an ongoing operation, where it indicates the next time a throttled request will execute. |
slice_id | Integer | The slice number for this response. Only present when using manual slicing. Indicates which slice of the operation this response represents. |
slices | Array | An array of slice results when using automatic slicing with a specific number. Each element contains the same response fields as the main response, showing the results for that individual slice. |
failures | Array | An array of failures if any unrecoverable errors occurred during the operation. If this array is not empty, the request aborted because of those failures. Update by query is implemented using batches, and any failure causes the entire process to abort, but all failures in the current batch are collected in this array. You can use the conflicts parameter set to proceed to prevent the operation from aborting on version conflicts. |
Managing update by query tasks
When you run an update by query operation asynchronously by setting wait_for_completion=false, OpenSearch returns a task ID that you can use to monitor, modify, or cancel the operation.
Retrieving the status of an update by query operation
To retrieve the status of an update by query operation, use the Tasks API:
GET _tasks?detailed=true&actions=*/update/byquery
The response includes the status of all running update by query operations. To retrieve the status of a specific task, use the task ID:
GET _tasks/<task_id>
The response contains detailed information about the operation’s progress:
{
"nodes": {
"node_id": {
"tasks": {
"task_id": {
"status": {
"total": 1000,
"updated": 450,
"created": 0,
"deleted": 0,
"batches": 5,
"version_conflicts": 0,
"noops": 0,
"retries": 0,
"throttled_millis": 0
}
}
}
}
}
}
The total field represents the total number of operations that the update by query operation expects to perform. You can estimate progress by adding the updated, deleted, and noops fields and comparing the sum to the total field. The operation is complete when their sum equals the total field.
Changing throttling for a running operation
To change the throttling of a running update by query operation, use the Rethrottle API with the task ID:
POST _update_by_query/<task_id>/_rethrottle?requests_per_second=100
Set requests_per_second to any positive decimal value or -1 to disable throttling. Rethrottling that speeds up the operation takes effect immediately. Rethrottling that slows down the operation takes effect after completing the current batch to prevent scroll timeouts.
Canceling an update by query operation
To cancel a running update by query operation, use the task cancel API:
POST _tasks/<task_id>/_cancel
Cancellation should happen quickly but might take a few seconds. The Tasks API continues to list the update by query task until it checks that it has been canceled and terminates itself. When you cancel an update by query operation with slices, OpenSearch cancels each sub-request.