Link Search Menu Expand Document Documentation Menu

Multi term vectors

The _mtermvectors API retrieves term vector information for multiple documents in one request. Term vectors provide detailed information about the terms (words) in a document, including term frequency, positions, offsets, and payloads. This can be useful for applications such as relevance scoring, highlighting, or similarity calculations. For more information, see Term vector parameter.

Endpoints

GET  /_mtermvectors
POST /_mtermvectors
GET  /{index}/_mtermvectors
POST /{index}/_mtermvectors

Path parameters

The following table lists the available path parameters. All path parameters are optional.

Parameter Data type Description
index String The name of the index that contains the document.

Query parameters

The following table lists the available query parameters. All query parameters are optional.

Parameter Data type Description
field_statistics Boolean If true, the response includes the document count, sum of document frequencies, and sum of total term frequencies. (Default: true)
fields List or String A comma-separated list or a wildcard expression specifying the fields to include in the statistics. Used as the default list unless a specific field list is provided in the completion_fields or fielddata_fields parameters.
ids List A comma-separated list of documents IDs. You must provide either the docs field in the request body or specify ids as a query parameter or in the request body.
offsets Boolean If true, the response includes term offsets. (Default: true)
payloads Boolean If true, the response includes term payloads. (Default: true)
positions Boolean If true, the response includes term positions. (Default: true)
preference String Specifies the node or shard on which the operation should be performed. See preference query parameter for a list of available options. By default the requests are routed randomly to available shard copies (primary or replica), with no guarantee of consistency across repeated queries.
realtime Boolean If true, the request is real time as opposed to near real time. (Default: true)
routing List or String A custom value used to route operations to a specific shard.
term_statistics Boolean If true, the response includes term frequency and document frequency. (Default: false)
version Integer If true, returns the document version as part of a hit.
version_type String The specific version type.
Valid values are:
- external: The version number must be greater than the current version.
- external_gte: The version number must be greater than or equal to the current version.
- force: The version number is forced to be the given value.
- internal: The version number is managed internally by OpenSearch.

Request body fields

The following table lists the fields that can be specified in the request body.

Field Data type Description
docs Array An array of document specifications.
ids Array of strings A list of document IDs to retrieve. Use only when all documents share the same index specified in the request path or query.
fields Array of strings A list of field names for which to return term vectors.
offsets Boolean If true, the response includes character offsets for each term. (Default: true)
payloads Boolean If true, the response includes payloads for each term. (Default: true)
positions Boolean If true, the response includes token positions. (Default: true)
field_statistics Boolean If true, the response includes statistics such as document count, sum of document frequencies, and sum of total term frequencies. (Default: true)
term_statistics Boolean If true, the response includes term frequency and document frequency. (Default: false)
routing String A custom routing value used to identify the shard. Required if custom routing was used during indexing.
version Integer The specific version of the document to retrieve.
version_type String The type of versioning to use. Valid values: internal, external, external_gte.
filter Object Filters tokens returned in the response (for example, by frequency or position). For supported fields, see Filtering terms.
per_field_analyzer Object Specifies a custom analyzer to use per field. Format: { "field_name": "analyzer_name" }.

Filtering terms

The filter object in the request body allows you to filter the tokens to include in the term vector response. The filter object supports the following fields.

Field Data type Description
max_num_terms Integer The maximum number of terms to return.
min_term_freq Integer The minimum term frequency in the document required for a term to be included.
max_term_freq Integer The maximum term frequency in the document required for a term to be included.
min_doc_freq Integer The minimum document frequency across the index required for a term to be included.
max_doc_freq Integer The maximum document frequency across the index required for a term to be included.
min_word_length Integer The minimum length of the term to be included.
max_word_length Integer The maximum length of the term to be included.

Example

Create an index with term vectors enabled:

PUT /my-index
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "term_vector": "with_positions_offsets_payloads"
      }
    }
  }
}

Index the first document:

POST /my-index/_doc/1
{
  "text": "OpenSearch is a search engine."
}

Index the second document:

POST /my-index/_doc/2
{
  "text": "OpenSearch provides powerful features."
}

Example request

Get term vectors for multiple documents:

POST /_mtermvectors
{
  "docs": [
    {
      "_index": "my-index",
      "_id": "1",
      "fields": ["text"]
    },
    {
      "_index": "my-index",
      "_id": "2",
      "fields": ["text"]
    }
  ]
}

Alternatively, you can specify both ids and fields as query parameters:

GET /my-index/_mtermvectors?ids=1,2&fields=text

You can also provide document IDs in the ids array instead of specifying docs:

GET /my-index/_mtermvectors?fields=text
{ 
  "ids": [
     "1", "2"
  ]
}

Example response

The response contains term vector information for the two documents:

{
  "docs": [
    {
      "_index": "my-index",
      "_id": "1",
      "_version": 1,
      "found": true,
      "took": 10,
      "term_vectors": {
        "text": {
          "field_statistics": {
            "sum_doc_freq": 9,
            "doc_count": 2,
            "sum_ttf": 9
          },
          "terms": {
            "a": {
              "term_freq": 1,
              "tokens": [
                {
                  "position": 2,
                  "start_offset": 14,
                  "end_offset": 15
                }
              ]
            },
            "engine": {
              "term_freq": 1,
              "tokens": [
                {
                  "position": 4,
                  "start_offset": 23,
                  "end_offset": 29
                }
              ]
            },
            "is": {
              "term_freq": 1,
              "tokens": [
                {
                  "position": 1,
                  "start_offset": 11,
                  "end_offset": 13
                }
              ]
            },
            "opensearch": {
              "term_freq": 1,
              "tokens": [
                {
                  "position": 0,
                  "start_offset": 0,
                  "end_offset": 10
                }
              ]
            },
            "search": {
              "term_freq": 1,
              "tokens": [
                {
                  "position": 3,
                  "start_offset": 16,
                  "end_offset": 22
                }
              ]
            }
          }
        }
      }
    },
    {
      "_index": "my-index",
      "_id": "2",
      "_version": 1,
      "found": true,
      "took": 0,
      "term_vectors": {
        "text": {
          "field_statistics": {
            "sum_doc_freq": 9,
            "doc_count": 2,
            "sum_ttf": 9
          },
          "terms": {
            "features": {
              "term_freq": 1,
              "tokens": [
                {
                  "position": 3,
                  "start_offset": 29,
                  "end_offset": 37
                }
              ]
            },
            "opensearch": {
              "term_freq": 1,
              "tokens": [
                {
                  "position": 0,
                  "start_offset": 0,
                  "end_offset": 10
                }
              ]
            },
            "powerful": {
              "term_freq": 1,
              "tokens": [
                {
                  "position": 2,
                  "start_offset": 20,
                  "end_offset": 28
                }
              ]
            },
            "provides": {
              "term_freq": 1,
              "tokens": [
                {
                  "position": 1,
                  "start_offset": 11,
                  "end_offset": 19
                }
              ]
            }
          }
        }
      }
    }
  ]
}

Response body fields

The following table lists all response body fields.

Field Data type Description
docs Array A list of requested documents containing term vectors.

Each element of the docs array contains the following fields.

Field Data type Description
term_vectors Object Contains term vector data for each field.
term_vectors.<field>.field_statistics Object Contains statistics about the field.
term_vectors.<field>.field_statistics.doc_count Integer The number of documents that contain at least one term in the specified field.
term_vectors.<field>.field_statistics.sum_doc_freq Integer The sum of document frequencies for all terms in the field.
term_vectors.<field>.field_statistics.sum_ttf Integer The sum of total term frequencies for all terms in the field.
term_vectors.<field>.terms Object A map of terms in the field, in which each term includes its frequency (term_freq) and associated token information.
term_vectors.<field>.terms.<term>.tokens Array An array of token objects for each term, including the token’s position in the text and its character offsets (start_offset and end_offset).
350 characters left

Have a question? .

Want to contribute? or .