Delete by Query API

Introduced 1.0

The Delete by Query API removes all documents from an index that match a specified query. Instead of deleting documents one by one, this API allows you to delete multiple documents in a single request based on search criteria.

Use this API in the following scenarios:

Removing outdated data from time-series indexes based on date ranges or other criteria.
Cleaning up test data or invalid documents that match specific patterns.
Implementing data retention policies by deleting documents that exceed a certain age.
Deleting documents containing sensitive information.

When you submit a delete by query request, OpenSearch creates a scroll context of the index at the start of the operation and deletes matching documents using internal versioning (sequence numbers and primary terms). The operation performs multiple search requests sequentially to find all matching documents, then executes bulk delete requests for each batch. If a document changes between when the snapshot is taken and when the delete operation processes it, a version conflict occurs and the delete fails for that document unless you set the conflicts parameter to proceed. Successfully deleted documents are not rolled back even if later operations in the batch fail.

OpenSearch retries rejected search or bulk requests up to 10 times with exponential backoff. If the maximum retry limit is reached, the operation halts and returns all failed requests in the response.

Note: OpenSearch cannot delete documents with version 0 using this API. The internal versioning system requires that version numbers are greater than 0 in order to track and process delete operations correctly.

Endpoints

POST /{index}/_delete_by_query

Path parameters

The following table lists the available path parameters.

Parameter	Required	Data type	Description
`index`	Required	List or String	A comma-separated list of data streams, indexes, and aliases to search. Supports wildcards (``). To search all data streams or indexes, omit this parameter or use `` or `_all`.

Query parameters

The following table lists the available query parameters. All query parameters are optional.

Parameter	Data type	Description	Default
`_source`	Boolean or List or String	Set to `true` or `false` to return the `_source` field or not, or a list of fields to return.	N/A
`_source_excludes`	List	List of fields to exclude from the returned `_source` field.	N/A
`_source_includes`	List	List of fields to extract and return from the `_source` field.	N/A
`allow_no_indices`	Boolean	If `false`, the request returns an error if any wildcard expression, index alias, or `_all` value targets only missing or closed indexes. This behavior applies even if the request targets other open indexes. For example, a request targeting `foo,bar` returns an error if an index starts with `foo` but no index starts with `bar`.	N/A
`analyze_wildcard`	Boolean	If `true`, wildcard and prefix queries are analyzed.	`false`
`analyzer`	String	Analyzer to use for the query string.	N/A
`conflicts`	String	What to do if delete by query hits version conflicts: `abort` or `proceed`. Valid values are: - `abort`: Abort the operation on version conflicts. - `proceed`: Proceed with the operation on version conflicts.	N/A
`default_operator`	String	The default operator for query string query: `AND` or `OR`. Valid values are: `and`, `AND`, `or`, and `OR`.	N/A
`df`	String	Field to use as default where no field prefix is given in the query string.	N/A
`expand_wildcards`	List or String	Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values, such as `open,hidden`. Valid values are: `all`, `open`, `closed`, `hidden`, `none`. Valid values are: - `all`: Match any index, including hidden ones. - `closed`: Match closed, non-hidden indexes. - `hidden`: Match hidden indexes. Must be combined with `open`, `closed`, or both. - `none`: Wildcard expressions are not accepted. - `open`: Match open, non-hidden indexes.	N/A
`from`	Integer	Starting offset.	`0`
`ignore_unavailable`	Boolean	If `false`, the request returns an error if it targets a missing or closed index.	N/A
`lenient`	Boolean	If `true`, format-based query failures (such as providing text to a numeric field) in the query string will be ignored.	N/A
`max_docs`	Integer	Maximum number of documents to process. Defaults to all documents.	N/A
`preference`	String	Specifies the node or shard the operation should be performed on. Random by default.	`random`
`q`	String	Query in the Lucene query string syntax.	N/A
`refresh`	Boolean or String	If `true`, OpenSearch refreshes all shards involved in the delete by query after the request completes. Valid values are: - `false`: Do not refresh the affected shards. - `true`: Refresh the affected shards immediately. - `wait_for`: Wait for the changes to become visible before replying.	N/A
`request_cache`	Boolean	If `true`, the request cache is used for this request. Defaults to the index-level setting.	N/A
`requests_per_second`	Float	The throttle for this request in sub-requests per second.	`0`
`routing`	List or String	A custom value used to route operations to a specific shard.	N/A
`scroll`	String	Period to retain the search context for scrolling.	N/A
`scroll_size`	Integer	Size of the scroll request that powers the operation.	`100`
`search_timeout`	String	Explicit timeout for each search request. Defaults to no timeout.	N/A
`search_type`	String	The type of the search operation. Available options: `query_then_fetch`, `dfs_query_then_fetch`. Valid values are: - `dfs_query_then_fetch`: Documents are scored using global term and document frequencies across all shards. This is usually slower but more accurate. - `query_then_fetch`: Documents are scored using local term and document frequencies for the shard. This is usually faster but less accurate.	N/A
`size`	Integer	Deprecated, use `max_docs` instead.	N/A
`slices`	Integer or String	The number of slices this task should be divided into. Valid values are: - `auto`: Automatically determine the number of slices.	N/A
`sort`	List	A comma-separated list of : pairs.	N/A
`stats`	List	Specific `tag` of the request for logging and statistical purposes.	N/A
`terminate_after`	Integer	Maximum number of documents to collect for each shard. If a query reaches this limit, OpenSearch terminates the query early. OpenSearch collects documents before sorting. Use with caution. OpenSearch applies this parameter to each shard handling the request. When possible, let OpenSearch perform early termination automatically. Avoid specifying this parameter for requests that target data streams with backing indexes across multiple data tiers.	N/A
`timeout`	String	Period each deletion request waits for active shards.	N/A
`version`	Boolean	If `true`, returns the document version as part of a hit.	N/A
`wait_for_active_shards`	Integer or String or NULL or String	The number of shard copies that must be active before proceeding with the operation. Set to all or any positive integer up to the total number of shards in the index (`number_of_replicas+1`). Valid values are: - `all`: Wait for all shards to be active.	N/A
`wait_for_completion`	Boolean	If `true`, the request blocks until the operation is complete.	`true`

Request body fields

The request body is optional but typically includes a query to specify which documents to delete.

Field	Data type	Description
`query`	Object	The query used to select documents for deletion. If not specified, the operation deletes all documents in the target index. For more information about query types, see Query DSL.
`slice`	Object	Manually specify slice ID and maximum slices for parallel processing. Contains `id` (integer, slice number) and `max` (integer, total number of slices). Optional.
`max_docs`	Integer	Maximum number of documents to process. Optional.

Refreshing shards

Specifying the refresh parameter refreshes all shards involved in the delete by query operation after the request completes. This behavior differs from the Delete Document API’s refresh parameter, which only refreshes the shard that received the delete request. The Delete by Query API does not support the wait_for value for the refresh parameter.

Running delete by query asynchronously

To run a delete by query operation asynchronously, set the wait_for_completion query parameter to false. OpenSearch performs preflight checks, launches the request, and returns a task ID that you can use to monitor progress or cancel the operation. When running asynchronously, OpenSearch creates a record of the task as a document at .tasks/task/${taskId}. After the task completes, delete the task document to allow OpenSearch to reclaim the space.

Waiting for active shards

The wait_for_active_shards parameter controls how many shard copies must be active before processing the request. The timeout parameter controls how long each write request waits for unavailable shards to become available. These parameters work the same way as in the Bulk API. Because Delete by Query uses scrolled searches, you can specify the scroll parameter to control how long the search context remains active. The default scroll time is 5 minutes.

Throttling delete requests

To control the rate at which delete by query issues batches of delete operations, set requests_per_second to any positive decimal number. This pads each batch with a wait time to throttle the rate. Set requests_per_second to -1 to disable throttling.

Throttling uses wait time between batches so that internal scroll requests can be given a timeout that accounts for request padding. The padding time is the difference between the batch size divided by requests_per_second and the time spent writing. By default, the batch size is 1,000, so if requests_per_second is set to 500:

target_time = 1,000 / 500 per second = 2 seconds
wait_time = target_time - write_time = 2 seconds - 0.5 seconds = 1.5 seconds

Because each batch is issued as a single bulk request, large batch sizes cause OpenSearch to create many requests and then wait before starting the next batch. This creates uneven processing patterns with periods of high activity followed by idle waiting.

Slicing for parallel processing

You can use slicing to run delete operations in parallel across multiple threads. This approach divides the delete operation into independent segments, improving performance for large-scale deletions.

Setting slices to auto allows OpenSearch to choose a reasonable number for most indexes. When using automatic slicing or tuning it manually, consider these factors:

Optimal query performance occurs when you match the slice count to your shard count. However, for indexes with many shards (500 or more), use fewer slices to avoid performance degradation from excessive parallelization overhead. Setting slices higher than the number of shards generally does not improve efficiency and adds overhead.
Delete performance scales linearly across available resources with the number of slices.
Whether query or delete performance dominates runtime depends on the documents being deleted and available cluster resources.

Example: Deleting documents matching a query

The following example request deletes all documents from the movies index where the year field is less than 2000:

POST /movies/_delete_by_query
{
  "query": {
    "range": {
      "year": {
        "lt": 2000
      }
    }
  }
}

response = client.delete_by_query(
  index = "movies",
  body =   {
    "query": {
      "range": {
        "year": {
          "lt": 2000
        }
      }
    }
  }
)

Example: Deleting with conflicts set to proceed

The following example request deletes documents matching the query and continues processing even when version conflicts occur:

POST /movies/_delete_by_query?conflicts=proceed
{
  "query": {
    "match": {
      "status": "archived"
    }
  }
}

response = client.delete_by_query(
  index = "movies",
  params = { "conflicts": "proceed" },
  body =   {
    "query": {
      "match": {
        "status": "archived"
      }
    }
  }
)

Example: Deleting from multiple indexes

The following example request deletes documents from multiple indexes that match the query:

POST /movies,tv-shows/_delete_by_query
{
  "query": {
    "match_all": {}
  }
}

response = client.delete_by_query(
  index = "movies,tv-shows",
  body =   {
    "query": {
      "match_all": {}
    }
  }
)

Example: Using routing for targeted deletion

The following example request limits the delete operation to shards with a specific routing value:

POST /movies/_delete_by_query?routing=user123
{
  "query": {
    "term": {
      "user_id": "user123"
    }
  }
}

response = client.delete_by_query(
  index = "movies",
  params = { "routing": "user123" },
  body =   {
    "query": {
      "term": {
        "user_id": "user123"
      }
    }
  }
)

Example: Using scroll_size to control batch size

The following example request uses a custom scroll batch size of 5,000 documents:

POST /movies/_delete_by_query?scroll_size=5000
{
  "query": {
    "term": {
      "genre": "documentary"
    }
  }
}

response = client.delete_by_query(
  index = "movies",
  params = { "scroll_size": "5000" },
  body =   {
    "query": {
      "term": {
        "genre": "documentary"
      }
    }
  }
)

Example: Manual slicing for parallel processing

The following example requests manually divide the delete operation into two slices for parallel processing:

POST /movies/_delete_by_query
{
  "slice": {
    "id": 0,
    "max": 2
  },
  "query": {
    "range": {
      "rating": {
        "lt": 5
      }
    }
  }
}

response = client.delete_by_query(
  index = "movies",
  body =   {
    "slice": {
      "id": 0,
      "max": 2
    },
    "query": {
      "range": {
        "rating": {
          "lt": 5
        }
      }
    }
  }
)

In a separate request, process the second slice:

POST /movies/_delete_by_query
{
  "slice": {
    "id": 1,
    "max": 2
  },
  "query": {
    "range": {
      "rating": {
        "lt": 5
      }
    }
  }
}

response = client.delete_by_query(
  index = "movies",
  body =   {
    "slice": {
      "id": 1,
      "max": 2
    },
    "query": {
      "range": {
        "rating": {
          "lt": 5
        }
      }
    }
  }
)

Example: Automatic slicing

The following example request uses automatic slicing to parallelize the delete operation across 5 slices:

POST /movies/_delete_by_query?slices=5&refresh=true
{
  "query": {
    "range": {
      "views": {
        "lt": 100
      }
    }
  }
}

response = client.delete_by_query(
  index = "movies",
  params = { "slices": "5", "refresh": "true" },
  body =   {
    "query": {
      "range": {
        "views": {
          "lt": 100
        }
      }
    }
  }
)

To allow OpenSearch to automatically determine the optimal number of slices, use slices=auto:

POST /movies/_delete_by_query?slices=auto
{
  "query": {
    "match": {
      "category": "test"
    }
  }
}

response = client.delete_by_query(
  index = "movies",
  params = { "slices": "auto" },
  body =   {
    "query": {
      "match": {
        "category": "test"
      }
    }
  }
)

Example response

The following example response shows a successful delete by query operation that deleted 8 documents:

{
  "took": 88,
  "timed_out": false,
  "total": 8,
  "deleted": 8,
  "batches": 1,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1.0,
  "throttled_until_millis": 0,
  "failures": []
}

When using manual slicing, the response includes a slice_id field indicating which slice was processed:

{
  "took": 13,
  "timed_out": false,
  "slice_id": 0,
  "total": 9,
  "deleted": 9,
  "batches": 1,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1.0,
  "throttled_until_millis": 0,
  "failures": []
}

When using automatic slicing with a specific number of slices, the response includes a slices array showing the results for each slice:

{
  "took": 52,
  "timed_out": false,
  "total": 9,
  "deleted": 9,
  "batches": 4,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1.0,
  "throttled_until_millis": 0,
  "slices": [
    {
      "slice_id": 0,
      "total": 3,
      "deleted": 3,
      "batches": 1,
      "version_conflicts": 0,
      "noops": 0,
      "retries": {
        "bulk": 0,
        "search": 0
      },
      "throttled_millis": 0,
      "requests_per_second": -1.0,
      "throttled_until_millis": 0
    },
    {
      "slice_id": 1,
      "total": 2,
      "deleted": 2,
      "batches": 1,
      "version_conflicts": 0,
      "noops": 0,
      "retries": {
        "bulk": 0,
        "search": 0
      },
      "throttled_millis": 0,
      "requests_per_second": -1.0,
      "throttled_until_millis": 0
    },
    {
      "slice_id": 2,
      "total": 0,
      "deleted": 0,
      "batches": 0,
      "version_conflicts": 0,
      "noops": 0,
      "retries": {
        "bulk": 0,
        "search": 0
      },
      "throttled_millis": 0,
      "requests_per_second": -1.0,
      "throttled_until_millis": 0
    },
    {
      "slice_id": 3,
      "total": 1,
      "deleted": 1,
      "batches": 1,
      "version_conflicts": 0,
      "noops": 0,
      "retries": {
        "bulk": 0,
        "search": 0
      },
      "throttled_millis": 0,
      "requests_per_second": -1.0,
      "throttled_until_millis": 0
    },
    {
      "slice_id": 4,
      "total": 3,
      "deleted": 3,
      "batches": 1,
      "version_conflicts": 0,
      "noops": 0,
      "retries": {
        "bulk": 0,
        "search": 0
      },
      "throttled_millis": 0,
      "requests_per_second": -1.0,
      "throttled_until_millis": 0
    }
  ],
  "failures": []
}

Response body fields

The following table lists all response body fields.

Field	Data type	Description
`took`	Integer	The amount of time from the start to the end of the entire operation, in milliseconds.
`timed_out`	Boolean	Whether any of the requests executed during the delete by query operation timed out. When set to `true`, successfully completed deletions still stick and are not rolled back.
`total`	Integer	The total number of documents that were successfully processed.
`deleted`	Integer	The number of documents that were successfully deleted.
`batches`	Integer	The number of scroll batches processed by the delete by query operation.
`version_conflicts`	Integer	The number of version conflicts encountered by the delete by query operation. Occurs when a document changes between the time the snapshot is taken and when the delete operation is processed.
`noops`	Integer	The number of no-operation requests. This field always returns 0 for delete by query. It exists to maintain response structure consistency with Update by Query and Reindex APIs.
`retries`	Object	The number of retries attempted by the delete by query operation. Contains `bulk` (number of bulk action retries) and `search` (number of search action retries).
`throttled_millis`	Integer	The amount of time the request was throttled to conform to `requests_per_second`, in milliseconds.
`requests_per_second`	Float	The number of requests per second effectively executed during the delete by query operation.
`throttled_until_millis`	Integer	The amount of time until the next throttled request will be executed, in milliseconds. Always equals 0 in a completed delete by query response. This field has meaning only when using the Tasks API to monitor an ongoing operation, where it indicates the next time a throttled request will execute.
`slice_id`	Integer	The slice number for this response. Only present when using manual slicing. Indicates which slice of the operation this response represents.
`slices`	Array	An array of slice results when using automatic slicing with a specific number. Each element contains the same response fields as the main response, showing the results for that individual slice.
`failures`	Array	An array of failures if any unrecoverable errors occurred during the operation. If this array is not empty, the request aborted because of those failures. Delete by query is implemented using batches, and any failure causes the entire process to abort, but all failures in the current batch are collected in this array. You can use the `conflicts` parameter set to `proceed` to prevent the operation from aborting on version conflicts.

Managing delete by query tasks

When you run a delete by query operation asynchronously by setting wait_for_completion=false, OpenSearch returns a task ID that you can use to monitor, modify, or cancel the operation.

Retrieving the status of a delete by query operation

To retrieve the status of a delete by query operation, use the Tasks API:

GET _tasks?detailed=true&actions=*/delete/byquery

The response includes the status of all running delete by query operations. To retrieve the status of a specific task, use the task ID:

GET _tasks/{task_id}

The response contains detailed information about the operation’s progress:

{
  "nodes": {
    "node_id": {
      "tasks": {
        "task_id": {
          "status": {
            "total": 1000,
            "updated": 0,
            "created": 0,
            "deleted": 450,
            "batches": 5,
            "version_conflicts": 0,
            "noops": 0,
            "retries": 0,
            "throttled_millis": 0
          }
        }
      }
    }
  }
}

The total field represents the total number of operations that the delete by query operation expects to perform. You can estimate progress by comparing the deleted field to the total field. The operation is complete when deleted equals total.

Changing throttling for a running operation

To change the throttling of a running delete by query operation, use the Rethrottle API with the task ID:

POST _delete_by_query/{task_id}/_rethrottle?requests_per_second=100

Set requests_per_second to any positive decimal value or -1 to disable throttling. Rethrottling that speeds up the operation takes effect immediately. Rethrottling that slows down the operation takes effect after completing the current batch to prevent scroll timeouts.

Canceling a delete by query operation

To cancel a running delete by query operation, use the task cancel API:

POST _tasks/{task_id}/_cancel

Cancellation should happen quickly but might take a few seconds. The Tasks API continues to list the delete by query task until it checks that it has been canceled and terminates itself. When you cancel a delete by query operation with slices, OpenSearch cancels each sub-request.

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.