Link Search Menu Expand Document Documentation Menu

Get Document API

Introduced 1.0

The Get Document API retrieves a JSON document and its metadata from an index by document ID. You can also use HEAD requests to verify that a document or its source exists without retrieving the full content.

Endpoints

To retrieve a document and its metadata from an index, use the GET method:

GET /{index}/_doc/{id}

To retrieve only the document source, use the following endpoint:

GET /{index}/_source/{id}

To verify that a document exists, use the HEAD method:

HEAD /{index}/_doc/{id}
HEAD /{index}/_source/{id}

Path parameters

The following table lists the available path parameters.

Parameter Required Data type Description
id Required String The unique identifier of the document.
index Required String The name of the index containing the document.

Query parameters

The following table lists the available query parameters. All query parameters are optional.

Parameter Methods Data type Description Default
_source GET Boolean or List or String Whether to return the _source field. Set to true to include it, false to exclude it, or specify a comma-separated list of field names to return. See Source filtering. N/A
_source_excludes GET List or String A comma-separated list of source fields to exclude from the response. See Source filtering. N/A
_source_includes GET List or String A comma-separated list of source fields to include in the response. See Source filtering. N/A
preference GET, HEAD String A preference for which node or shard should handle the operation. By default, OpenSearch selects a shard replica randomly. See Preference. random
realtime GET, HEAD Boolean Whether the request is real time. If true, the request retrieves the most recent version of the document. If false, the request is near-real-time and retrieves the document based on the last refresh. See Real-time behavior. true
refresh GET, HEAD Boolean or String Whether to refresh the affected shards before the operation to make recent changes visible.
Valid values are:
- false: Do not refresh the affected shards.
- true: Refresh the affected shards immediately.
- wait_for: Wait for the changes to become visible before responding. See Refresh.
false
routing GET, HEAD List or String The routing value used to target a specific primary shard. See Routing. N/A
stored_fields GET List or String A comma-separated list of stored fields to return. If no fields are specified, no stored fields are included in the response. If this parameter is specified, the _source parameter defaults to false. N/A
version GET, HEAD Integer The explicit version number for concurrency control. The specified version must match the current version of the document for the request to succeed. N/A
version_type GET, HEAD String The version type for concurrency control.
Valid values are:
- internal: The version number is managed internally by OpenSearch.
- external: The version number must be greater than the current version.
- external_gte: The version number must be greater than or equal to the current version.
internal

Example request

The following example retrieves a document by its ID:

GET /products/_doc/1
response = client.get(
  id = "1",
  index = "products"
)

Example response

The following example shows a response from a GET request:

{
  "_index": "products",
  "_id": "1",
  "_version": 1,
  "_seq_no": 0,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "name": "Wireless Mouse",
    "description": "Ergonomic wireless mouse with optical sensor",
    "price": 29.99,
    "category": "Electronics",
    "in_stock": true,
    "manufacturer": "TechCorp",
    "model": "WM-2000",
    "tags": [
      "wireless",
      "ergonomic",
      "optical"
    ]
  }
}

Response body fields

The GET response contains the following fields.

Field Data type Description
_index String The name of the index containing the document.
_id String The document’s unique identifier.
_version Integer The document’s version number. Incremented each time the document is updated.
_seq_no Integer The sequence number assigned to the document for the indexing operation. Used to ensure an older version doesn’t overwrite a newer version.
_primary_term Integer The primary term assigned to the document for the indexing operation. Used with _seq_no for optimistic concurrency control.
found Boolean Indicates whether the document exists. true if the document was found, false otherwise.
_routing String The routing value used to determine which shard stores the document. Only included if a routing value was specified when the document was indexed.
_source Object The original JSON document that was indexed. Excluded if the _source parameter is set to false or if the stored_fields parameter is used.
_fields Object Contains stored field values when the stored_fields parameter is specified. Only returned if stored_fields is set and found is true. Field values are always returned as arrays. See Retrieving stored fields.

Source filtering

By default, the Get Document API returns the entire contents of the _source field. You can control which parts of the source are returned or exclude it entirely.

Disabling source retrieval

To exclude the _source field from the response, set the _source parameter to false. The following example retrieves document metadata without the source content:

GET /products/_doc/1?_source=false
response = client.get(
  id = "1",
  index = "products",
  params = { "_source": "false" }
)

The response excludes the _source field:

{
  "_index": "products",
  "_id": "1",
  "_version": 1,
  "_seq_no": 0,
  "_primary_term": 1,
  "found": true
}

Source includes and excludes

To retrieve only specific fields from a large document, use the _source_includes parameter to include specific fields or the _source_excludes parameter to exclude fields. This reduces network overhead by transferring only the required data.

The following example retrieves only the name and price fields:

GET /products/_doc/1?_source_includes=name,price
response = client.get(
  id = "1",
  index = "products",
  params = { "_source_includes": "name,price" }
)

The _source field of the response contains only the price and name fields:

{
  "_index": "products",
  "_id": "1",
  "_version": 1,
  "_seq_no": 0,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "price": 29.99,
    "name": "Wireless Mouse"
  }
}

Shorter notation

If you only need to include certain fields without excluding any, use the shorter notation by specifying fields directly in the _source parameter. The following example retrieves only the name and price fields:

GET /products/_doc/1?_source=name,price
response = client.get(
  id = "1",
  index = "products",
  params = { "_source": "name,price" }
)

Retrieving the source field only

Use the _source endpoint to retrieve only the document source without metadata. The following example retrieves only the source content:

GET /products/_source/1
response = client.get_source(
  id = "1",
  index = "products"
)

The response contains the _source field only:

{
  "name": "Wireless Mouse",
  "description": "Ergonomic wireless mouse with optical sensor",
  "price": 29.99,
  "category": "Electronics",
  "in_stock": true,
  "manufacturer": "TechCorp",
  "model": "WM-2000",
  "tags": [
    "wireless",
    "ergonomic",
    "optical"
  ]
}

You can combine the _source endpoint with source filtering parameters. The following example retrieves only specific fields from the source:

GET /products/_source/1?_source=name,price
response = client.get_source(
  id = "1",
  index = "products",
  params = { "_source": "name,price" }
)

The response contains only the price and name fields:

{
  "price": 29.99,
  "name": "Wireless Mouse"
}

You can use HEAD with the _source endpoint to check whether the document source exists:

HEAD /products/_source/1
response = client.exists_source(
  id = "1",
  index = "products"
)

The response contains only 200 - true.

Routing

When documents are indexed with a custom routing value, you must provide the same routing value when retrieving them. The routing value determines which shard stores the document.

The following example retrieves a document that was indexed with routing value user1:

GET /products/_doc/2?routing=user1
response = client.get(
  id = "2",
  index = "products",
  params = { "routing": "user1" }
)

The response contains the document with the specified routing:

{
  "_index": "products",
  "_id": "2",
  "_version": 1,
  "_seq_no": 1,
  "_primary_term": 1,
  "_routing": "user1",
  "found": true,
  "_source": {
    "name": "Mechanical Keyboard",
    "description": "RGB mechanical gaming keyboard",
    "price": 149.99,
    "category": "Electronics",
    "in_stock": true,
    "manufacturer": "GameGear",
    "model": "MK-500",
    "tags": [
      "mechanical",
      "rgb",
      "gaming"
    ]
  }
}

If you don’t specify the correct routing value, OpenSearch cannot locate the document and returns a found: false response.

Retrieving stored fields

Use the stored_fields parameter to retrieve specific fields that were stored in the index at indexing time. Only fields with store: true in the mapping are returned. Fields without this setting are ignored.

The following example retrieves only the category and manufacturer stored fields from a document:

GET /products/_doc/1?stored_fields=category,manufacturer
response = client.get(
  id = "1",
  index = "products",
  params = { "stored_fields": "category,manufacturer" }
)

Note that field values retrieved from stored fields are always returned as arrays. Even though category and manufacturer are single-valued fields, they are returned in arrays:

{
  "_index": "products",
  "_id": "1",
  "_version": 1,
  "_seq_no": 0,
  "_primary_term": 1,
  "found": true,
  "fields": {
    "category": [
      "Electronics"
    ],
    "manufacturer": [
      "TechCorp"
    ]
  }
}

When retrieving stored fields from a document indexed with routing, you must provide the routing value. The following example retrieves stored fields from a document with routing:

GET /products/_doc/2?routing=user1&stored_fields=category,manufacturer
response = client.get(
  id = "2",
  index = "products",
  params = { "routing": "user1", "stored_fields": "category,manufacturer" }
)

Checking document existence

You can use the HEAD method to verify whether a document exists without retrieving its content. OpenSearch returns HTTP status code 200 if the document exists or 404 if it doesn’t.

The following example checks whether a document exists:

HEAD /products/_doc/1
response = client.exists(
  id = "1",
  index = "products"
)

The response contains only 200 - true.

Preference

The preference parameter controls which shard replica handles the request. By default, OpenSearch randomly distributes get operations across available shard replicas.

You can set the preference parameter to one of the following values:

  • _local: Directs the operation to a locally allocated shard replica, reducing network overhead.
  • Custom string value: Routes requests with the same custom value to the same shard replicas. This ensures consistent results when shards are in different refresh states. Common custom values include session IDs or usernames.

Real-time behavior

By default, the Get Document API operates in real time, retrieving the latest version of a document regardless of the index refresh rate. This means you can retrieve a document immediately after indexing it, even before the index has been refreshed to make it searchable.

When you request stored fields (using the stored_fields parameter) and the document has been updated but not yet refreshed, OpenSearch parses and analyzes the document source to extract the requested stored fields.

To disable real-time behavior and retrieve the document based on the last refreshed state of the index, set the realtime parameter to false.

Refresh

The refresh parameter can be set to true to refresh the relevant shard before retrieving the document. Refreshing makes recent changes searchable but can impose significant system load and slow indexing. Carefully evaluate the trade-off between data freshness and performance before enabling this parameter.

Versioning

You can use the version parameter to retrieve a document only if its current version matches the specified number. This ensures data consistency when working with versioned documents.

Internally, OpenSearch marks the old document version as deleted when a document is updated and creates an entirely new document version. Although you cannot access old versions through the Get Document API, OpenSearch automatically cleans up deleted versions in the background during indexing.

Distributed model

The Get Document API uses the document ID to compute a hash value that identifies the shard storing the document. OpenSearch then routes the request to one of the replicas in that shard group (including the primary shard and its replicas) and returns the result.

Having more shard replicas improves GET operation scalability because the load is distributed across multiple replicas, increasing throughput for retrieval requests.