Get Document API
Introduced 1.0
The Get Document API retrieves a JSON document and its metadata from an index by document ID. You can also use HEAD requests to verify that a document or its source exists without retrieving the full content.
Endpoints
To retrieve a document and its metadata from an index, use the GET method:
GET /{index}/_doc/{id}
To retrieve only the document source, use the following endpoint:
GET /{index}/_source/{id}
To verify that a document exists, use the HEAD method:
HEAD /{index}/_doc/{id}
HEAD /{index}/_source/{id}
Path parameters
The following table lists the available path parameters.
| Parameter | Required | Data type | Description |
|---|---|---|---|
id | Required | String | The unique identifier of the document. |
index | Required | String | The name of the index containing the document. |
Query parameters
The following table lists the available query parameters. All query parameters are optional.
| Parameter | Methods | Data type | Description | Default |
|---|---|---|---|---|
_source | GET | Boolean or List or String | Whether to return the _source field. Set to true to include it, false to exclude it, or specify a comma-separated list of field names to return. See Source filtering. | N/A |
_source_excludes | GET | List or String | A comma-separated list of source fields to exclude from the response. See Source filtering. | N/A |
_source_includes | GET | List or String | A comma-separated list of source fields to include in the response. See Source filtering. | N/A |
preference | GET, HEAD | String | A preference for which node or shard should handle the operation. By default, OpenSearch selects a shard replica randomly. See Preference. | random |
realtime | GET, HEAD | Boolean | Whether the request is real time. If true, the request retrieves the most recent version of the document. If false, the request is near-real-time and retrieves the document based on the last refresh. See Real-time behavior. | true |
refresh | GET, HEAD | Boolean or String | Whether to refresh the affected shards before the operation to make recent changes visible. Valid values are: - false: Do not refresh the affected shards. - true: Refresh the affected shards immediately. - wait_for: Wait for the changes to become visible before responding. See Refresh. | false |
routing | GET, HEAD | List or String | The routing value used to target a specific primary shard. See Routing. | N/A |
stored_fields | GET | List or String | A comma-separated list of stored fields to return. If no fields are specified, no stored fields are included in the response. If this parameter is specified, the _source parameter defaults to false. | N/A |
version | GET, HEAD | Integer | The explicit version number for concurrency control. The specified version must match the current version of the document for the request to succeed. | N/A |
version_type | GET, HEAD | String | The version type for concurrency control. Valid values are: - internal: The version number is managed internally by OpenSearch. - external: The version number must be greater than the current version. - external_gte: The version number must be greater than or equal to the current version. | internal |
Example request
The following example retrieves a document by its ID:
GET /products/_doc/1response = client.get(
id = "1",
index = "products"
)Example response
The following example shows a response from a GET request:
{
"_index": "products",
"_id": "1",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source": {
"name": "Wireless Mouse",
"description": "Ergonomic wireless mouse with optical sensor",
"price": 29.99,
"category": "Electronics",
"in_stock": true,
"manufacturer": "TechCorp",
"model": "WM-2000",
"tags": [
"wireless",
"ergonomic",
"optical"
]
}
}
Response body fields
The GET response contains the following fields.
| Field | Data type | Description |
|---|---|---|
_index | String | The name of the index containing the document. |
_id | String | The document’s unique identifier. |
_version | Integer | The document’s version number. Incremented each time the document is updated. |
_seq_no | Integer | The sequence number assigned to the document for the indexing operation. Used to ensure an older version doesn’t overwrite a newer version. |
_primary_term | Integer | The primary term assigned to the document for the indexing operation. Used with _seq_no for optimistic concurrency control. |
found | Boolean | Indicates whether the document exists. true if the document was found, false otherwise. |
_routing | String | The routing value used to determine which shard stores the document. Only included if a routing value was specified when the document was indexed. |
_source | Object | The original JSON document that was indexed. Excluded if the _source parameter is set to false or if the stored_fields parameter is used. |
_fields | Object | Contains stored field values when the stored_fields parameter is specified. Only returned if stored_fields is set and found is true. Field values are always returned as arrays. See Retrieving stored fields. |
Source filtering
By default, the Get Document API returns the entire contents of the _source field. You can control which parts of the source are returned or exclude it entirely.
Disabling source retrieval
To exclude the _source field from the response, set the _source parameter to false. The following example retrieves document metadata without the source content:
GET /products/_doc/1?_source=falseresponse = client.get(
id = "1",
index = "products",
params = { "_source": "false" }
)The response excludes the _source field:
{
"_index": "products",
"_id": "1",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true
}
Source includes and excludes
To retrieve only specific fields from a large document, use the _source_includes parameter to include specific fields or the _source_excludes parameter to exclude fields. This reduces network overhead by transferring only the required data.
The following example retrieves only the name and price fields:
GET /products/_doc/1?_source_includes=name,priceresponse = client.get(
id = "1",
index = "products",
params = { "_source_includes": "name,price" }
)The _source field of the response contains only the price and name fields:
{
"_index": "products",
"_id": "1",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source": {
"price": 29.99,
"name": "Wireless Mouse"
}
}
Shorter notation
If you only need to include certain fields without excluding any, use the shorter notation by specifying fields directly in the _source parameter. The following example retrieves only the name and price fields:
GET /products/_doc/1?_source=name,priceresponse = client.get(
id = "1",
index = "products",
params = { "_source": "name,price" }
)Retrieving the source field only
Use the _source endpoint to retrieve only the document source without metadata. The following example retrieves only the source content:
GET /products/_source/1response = client.get_source(
id = "1",
index = "products"
)The response contains the _source field only:
{
"name": "Wireless Mouse",
"description": "Ergonomic wireless mouse with optical sensor",
"price": 29.99,
"category": "Electronics",
"in_stock": true,
"manufacturer": "TechCorp",
"model": "WM-2000",
"tags": [
"wireless",
"ergonomic",
"optical"
]
}
You can combine the _source endpoint with source filtering parameters. The following example retrieves only specific fields from the source:
GET /products/_source/1?_source=name,priceresponse = client.get_source(
id = "1",
index = "products",
params = { "_source": "name,price" }
)The response contains only the price and name fields:
{
"price": 29.99,
"name": "Wireless Mouse"
}
You can use HEAD with the _source endpoint to check whether the document source exists:
HEAD /products/_source/1response = client.exists_source(
id = "1",
index = "products"
)The response contains only 200 - true.
Routing
When documents are indexed with a custom routing value, you must provide the same routing value when retrieving them. The routing value determines which shard stores the document.
The following example retrieves a document that was indexed with routing value user1:
GET /products/_doc/2?routing=user1response = client.get(
id = "2",
index = "products",
params = { "routing": "user1" }
)The response contains the document with the specified routing:
{
"_index": "products",
"_id": "2",
"_version": 1,
"_seq_no": 1,
"_primary_term": 1,
"_routing": "user1",
"found": true,
"_source": {
"name": "Mechanical Keyboard",
"description": "RGB mechanical gaming keyboard",
"price": 149.99,
"category": "Electronics",
"in_stock": true,
"manufacturer": "GameGear",
"model": "MK-500",
"tags": [
"mechanical",
"rgb",
"gaming"
]
}
}
If you don’t specify the correct routing value, OpenSearch cannot locate the document and returns a found: false response.
Retrieving stored fields
Use the stored_fields parameter to retrieve specific fields that were stored in the index at indexing time. Only fields with store: true in the mapping are returned. Fields without this setting are ignored.
The following example retrieves only the category and manufacturer stored fields from a document:
GET /products/_doc/1?stored_fields=category,manufacturerresponse = client.get(
id = "1",
index = "products",
params = { "stored_fields": "category,manufacturer" }
)Note that field values retrieved from stored fields are always returned as arrays. Even though category and manufacturer are single-valued fields, they are returned in arrays:
{
"_index": "products",
"_id": "1",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"fields": {
"category": [
"Electronics"
],
"manufacturer": [
"TechCorp"
]
}
}
When retrieving stored fields from a document indexed with routing, you must provide the routing value. The following example retrieves stored fields from a document with routing:
GET /products/_doc/2?routing=user1&stored_fields=category,manufacturerresponse = client.get(
id = "2",
index = "products",
params = { "routing": "user1", "stored_fields": "category,manufacturer" }
)Checking document existence
You can use the HEAD method to verify whether a document exists without retrieving its content. OpenSearch returns HTTP status code 200 if the document exists or 404 if it doesn’t.
The following example checks whether a document exists:
HEAD /products/_doc/1response = client.exists(
id = "1",
index = "products"
)The response contains only 200 - true.
Preference
The preference parameter controls which shard replica handles the request. By default, OpenSearch randomly distributes get operations across available shard replicas.
You can set the preference parameter to one of the following values:
_local: Directs the operation to a locally allocated shard replica, reducing network overhead.- Custom string value: Routes requests with the same custom value to the same shard replicas. This ensures consistent results when shards are in different refresh states. Common custom values include session IDs or usernames.
Real-time behavior
By default, the Get Document API operates in real time, retrieving the latest version of a document regardless of the index refresh rate. This means you can retrieve a document immediately after indexing it, even before the index has been refreshed to make it searchable.
When you request stored fields (using the stored_fields parameter) and the document has been updated but not yet refreshed, OpenSearch parses and analyzes the document source to extract the requested stored fields.
To disable real-time behavior and retrieve the document based on the last refreshed state of the index, set the realtime parameter to false.
Refresh
The refresh parameter can be set to true to refresh the relevant shard before retrieving the document. Refreshing makes recent changes searchable but can impose significant system load and slow indexing. Carefully evaluate the trade-off between data freshness and performance before enabling this parameter.
Versioning
You can use the version parameter to retrieve a document only if its current version matches the specified number. This ensures data consistency when working with versioned documents.
Internally, OpenSearch marks the old document version as deleted when a document is updated and creates an entirely new document version. Although you cannot access old versions through the Get Document API, OpenSearch automatically cleans up deleted versions in the background during indexing.
Distributed model
The Get Document API uses the document ID to compute a hash value that identifies the shard storing the document. OpenSearch then routes the request to one of the replicas in that shard group (including the primary shard and its replicas) and returns the result.
Having more shard replicas improves GET operation scalability because the load is distributed across multiple replicas, increasing throughput for retrieval requests.