Index Document API

Introduced 1.0

The Index Document API adds a JSON document to a specified index and makes it searchable. If a document with the same ID already exists, the API updates the document and increments its version number.

Endpoints

PUT {index}/_doc/{id}
POST {index}/_doc

PUT {index}/_create/{id}
POST {index}/_create/{id}

Use the following endpoint combinations to control how documents are indexed:

PUT {index}/_doc/{id}: Adds a new document with a specified ID or updates an existing document with the same ID.
POST {index}/_doc: Adds a new document and automatically generates a unique ID.
PUT {index}/_create/{id} or POST {index}/_create/{id}: Adds a new document with a specified ID only if a document with that ID does not already exist. If the document exists, the operation fails.

Path parameters

The following table lists the available path parameters.

Parameter	Data type	Description
`index`	String	The name of the index. If the index does not exist, OpenSearch creates it automatically unless automatic index creation is disabled. Required.
`id`	String	The unique document ID. Required when using PUT. Omit this parameter when using POST to let OpenSearch automatically generate a unique ID.

Query parameters

The following table lists the available query parameters. All query parameters are optional.

Parameter	Data type	Description
`if_seq_no`	Integer	Only performs the operation if the document’s current sequence number matches the specified value. Used for optimistic concurrency control. See Optimistic concurrency control.
`if_primary_term`	Integer	Only performs the operation if the document’s current primary term matches the specified value. Used for optimistic concurrency control. See Optimistic concurrency control.
`op_type`	Enum	The operation type. Valid values are `create` (indexes a document only if it does not already exist) and `index` (creates a new document or updates an existing document). If a document ID is specified, the default is `index`. Otherwise, the default is `create`.
`pipeline`	String	The ID of the ingest pipeline to use for preprocessing the document before indexing.
`routing`	String	A custom routing value used to route the operation to a specific shard. See Routing.
`refresh`	Enum	Whether to refresh the affected shards after the operation. Valid values are `true` (refresh immediately), `false` (do not refresh), and `wait_for` (wait for a refresh to occur before responding). Default is `false`. See Refresh.
`timeout`	Time	The amount of time to wait for the primary shard to become available if it is unavailable. Default is `1m`. See Timeout.
`version`	Integer	The explicit version number for concurrency control. The document is only indexed if its current version matches this value. See Versioning.
`version_type`	Enum	The version type for external versioning. Valid values are `external` (only indexes if the specified version is greater than the stored version) and `external_gte` (only indexes if the specified version is greater than or equal to the stored version). Default is `internal`. See Versioning.
`wait_for_active_shards`	String	The number of active shard copies required before proceeding with the operation. Valid values are `all` or a positive integer up to the total number of shards. Default is `1` (only the primary shard). See Wait for active shards.
`require_alias`	Boolean	Whether the target index name must be an index alias. If `true` and the target is not an alias, the request fails. Default is `false`.

Example requests

The following example requests create a sample index document for an index named sample_index.

Example PUT request

PUT /sample_index/_doc/1
{
  "name": "Example",
  "price": 29.99,
  "description": "To be or not to be, that is the question"
}

response = client.index(
  index = "sample_index",
  id = "1",
  body =   {
    "name": "Example",
    "price": 29.99,
    "description": "To be or not to be, that is the question"
  }
)

Example POST request

POST /sample_index/_doc
{
  "name": "Another Example",
  "price": 19.99,
  "description": "We are such stuff as dreams are made on"
}

response = client.index(
  index = "sample_index",
  body =   {
    "name": "Another Example",
    "price": 19.99,
    "description": "We are such stuff as dreams are made on"
  }
)

Example response

{
  "_index": "sample-index",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

Response body fields

The following table lists all response body fields.

Field	Data type	Description
`_index`	String	The name of the index to which the document was added.
`_id`	String	The document’s unique identifier.
`_version`	Integer	The document’s version number. Incremented each time the document is updated.
`result`	String	The result of the indexing operation. Possible values are `created` (a new document was created) and `updated` (an existing document was updated).
`_shards`	Object	Information about the replication process.
`_shards.total`	Integer	The number of shard copies (primary and replicas) on which the operation should be executed.
`_shards.successful`	Integer	The number of shard copies on which the operation succeeded. When the operation succeeds, this value is at least 1 (the primary shard).
`_shards.failed`	Integer	The number of shard copies on which the operation failed. If the operation succeeds, this value is 0.
`_seq_no`	Integer	The sequence number assigned to the document for this indexing operation. Sequence numbers are used to ensure that an older version of a document does not overwrite a newer version. See Optimistic concurrency control.
`_primary_term`	Integer	The primary term assigned to the document for this indexing operation. See Optimistic concurrency control.

Automatic index creation

By default, if the specified index does not exist, the Index Document API automatically creates it and applies any configured index templates. The API also creates a dynamic mapping for new fields if no explicit mapping exists.

Automatic index creation is controlled by the action.auto_create_index setting. By default, this setting is true, allowing any index to be created automatically. You can modify this setting to allow or block index creation based on specific patterns or disable automatic index creation entirely. For more information, see Create index.

Optimistic concurrency control

You can use the if_seq_no and if_primary_term parameters to perform conditional indexing based on the document’s current sequence number and primary term. This ensures that the operation only succeeds if the document has not been modified since you last retrieved it.

For example, to update a document only if it has sequence number 3 and primary term 1, include these parameters in your request:

PUT sample-index/_doc/1?if_seq_no=3&if_primary_term=1
{
  "name": "Updated Example",
  "price": 39.99
}

If the sequence number or primary term does not match the current values, OpenSearch returns a version conflict error (HTTP 409), allowing you to retrieve the latest version and retry the operation.

Automatic ID generation

When you use the POST method without specifying a document ID, OpenSearch automatically generates a unique ID for the document. The op_type is automatically set to create, ensuring a new document is always created.

The following example indexes a document without specifying an ID, allowing OpenSearch to generate one automatically:

POST sample-index/_doc
{
  "user": "john_doe",
  "post_date": "2024-01-15T10:30:00",
  "message": "Hello, OpenSearch!"
}

The response includes the automatically generated ID:

{
  "_index": "sample-index",
  "_id": "W0tpsmIBdwcYyG50zbta",
  "_version": 1,
  "_seq_no": 0,
  "_primary_term": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  }
}

The generated ID is a Base64-encoded UUID that ensures uniqueness across your cluster.

Routing

By default, OpenSearch determines which shard stores a document by computing a hash of the document’s ID. You can override this behavior by providing a custom routing parameter value.

The following example routes the document to a shard based on the routing value user123:

POST sample-index/_doc?routing=user123
{
  "user": "john_doe",
  "message": "Hello, world!"
}

When you use custom routing during indexing, you must provide the same routing value when retrieving, updating, or deleting the document. Otherwise, OpenSearch cannot locate the document.

Distributed model

The index operation is directed to the primary shard based on the document’s routing value (either the document ID or a custom routing value). Once the primary shard completes the operation, OpenSearch distributes the update to all applicable replica shards in the replication group.

This distributed approach ensures that all shard copies remain synchronized. The primary shard coordinates the replication process and waits for confirmation from the required number of active shards before acknowledging success to the client.

Wait for active shards

To improve write operation resiliency, you can configure the Index Document API to wait for a certain number of active shard copies before proceeding. By default, the operation waits only for the primary shard to be active (wait_for_active_shards=1).

You can set wait_for_active_shards to all or any positive integer up to the total number of shard copies (number_of_replicas + 1). If the required number of active shards is not available, the operation waits and retries until the shards become available or a timeout occurs.

For example, consider a cluster with three nodes (A, B, and C) and an index with number_of_replicas set to 3, resulting in 4 shard copies (one primary and three replicas). By default, an indexing operation proceeds as long as the primary shard is available, even if nodes B and C are down and node A hosts the primary shard copy.

If you set wait_for_active_shards=3 on the request, the indexing operation requires 3 active shard copies before proceeding. This requirement can be met when all 3 nodes are running, with each node containing a copy of the shard. However, if you set wait_for_active_shards=all (or 4), the indexing operation does not proceed because you need all 4 copies active, but only 3 nodes exist. The operation times out unless a new node joins the cluster to host the fourth shard copy.

The following example requires at least 2 active shard copies (the primary and one replica) before proceeding:

PUT sample-index/_doc/1?wait_for_active_shards=2
{
  "name": "Example",
  "price": 29.99
}

This setting reduces the risk of writing to an insufficient number of shard copies but does not eliminate it entirely. The check occurs before the write operation begins. Once the operation is underway, replication can still fail on some replicas while succeeding on the primary. The _shards section of the response indicates how many shard copies succeeded or failed.

Refresh

The refresh parameter controls when indexed documents become visible to search operations. For most use cases, use the default value (false) for optimal performance.

Valid options are:

false (default): The document becomes visible according to the index refresh interval (by default, 1 second).
true: Forces an immediate refresh after indexing, making the document immediately searchable. Use sparingly, as frequent refreshes can significantly impact performance.
wait_for: Waits for the next scheduled refresh before responding. More efficient than true for batch operations.

Timeout

If the primary shard is unavailable when you submit an index request (for example, during recovery or relocation), the operation waits for up to 1 minute by default before failing. You can adjust this behavior using the timeout parameter:

PUT sample-index/_doc/1?timeout=5m
{
  "name": "Example",
  "price": 29.99
}

Versioning

Every indexed document has a version number. By default, OpenSearch uses internal versioning, starting at 1 and incrementing with each update or delete operation.

For external versioning (such as maintaining version numbers in a separate database), set the version_type parameter to control how OpenSearch handles version conflicts. The following table lists the available version types.

Version type	Description
`internal`	Only indexes the document if the specified version is identical to the version of the stored document. This is the default version type.
`external` or `external_gt`	Only indexes the document if the specified version is strictly greater than the version of the stored document or if there is no existing document. The specified version is used as the new version and stored with the document. The supplied version must be a non-negative long integer.
`external_gte`	Only indexes the document if the specified version is greater than or equal to the version of the stored document. If there is no existing document, the operation succeeds. The specified version is used as the new version and stored with the document. The supplied version must be a non-negative long integer.

The external_gte version type is intended for special use cases and should be used with care. If used incorrectly, it can result in data loss.

For example, to index a document using external versioning:

PUT sample-index/_doc/1?version=5&version_type=external
{
  "name": "Example",
  "price": 29.99,
  "description": "Updated from external system"
}

If the provided version does not meet the requirements of the specified version type, OpenSearch returns a version conflict error. Versioning is completely real time and is not affected by the near-real-time aspects of search operations.

Noop updates

When you update a document using the Index Document API, OpenSearch always creates a new version of the document, even if the document content has not changed. This behavior can be inefficient if you frequently reindex documents with the same content.

If you need to avoid creating unnecessary document versions, use the Update Document API with the detect_noop parameter set to true. The Update API fetches the existing document, compares it to the new content, and only creates a new version if the content has changed.

The Index Document API does not support noop detection because it does not fetch the old source for comparison. Whether noop updates are problematic depends on several factors, including how frequently your data source sends updates that do not change the document and the query load on the shard receiving the updates.

Endpoints
Path parameters
Query parameters
Example requests
- Example PUT request
- Example POST request
Example response
Response body fields
Automatic index creation
Optimistic concurrency control
Automatic ID generation
Routing
Distributed model
Wait for active shards
Refresh
Timeout
Versioning
Noop updates

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.