You're viewing version 3.0 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.
Term vectors
The _termvectors
API retrieves term vector information for a single document. Term vectors provide detailed information about the terms (words) in a document, including term frequency, positions, offsets, and payloads. This can be useful for applications such as relevance scoring, highlighting, or similarity calculations. For more information, see Term vector parameter.
Endpoints
GET /{index}/_termvectors
POST /{index}/_termvectors
GET /{index}/_termvectors/{id}
POST /{index}/_termvectors/{id}
Path parameters
The following table lists the available path parameters.
Parameter | Required | Data type | Description |
---|---|---|---|
index | Required | String | The name of the index containing the document. |
id | Optional | String | The unique identifier of the document. |
Query parameters
The following table lists the available query parameters. All query parameters are optional.
Parameter | Data type | Description |
---|---|---|
field_statistics | Boolean | If true , the response includes the document count, sum of document frequencies, and sum of total term frequencies. (Default: true ) |
fields | List or String | A comma-separated list or a wildcard expression specifying the fields to include in the statistics. Used as the default list unless a specific field list is provided in the completion_fields or fielddata_fields parameters. |
offsets | Boolean | If true , the response includes term offsets. (Default: true ) |
payloads | Boolean | If true , the response includes term payloads. (Default: true ) |
positions | Boolean | If true , the response includes term positions. (Default: true ) |
preference | String | Specifies the node or shard on which the operation should be performed. See preference query parameter for a list of available options. By default the requests are routed randomly to available shard copies (primary or replica), with no guarantee of consistency across repeated queries. |
realtime | Boolean | If true , the request is real time as opposed to near real time. (Default: true ) |
routing | List or String | A custom value used to route operations to a specific shard. |
term_statistics | Boolean | If true , the response includes term frequency and document frequency. (Default: false ) |
version | Integer | If true , returns the document version as part of a hit. |
version_type | String | The specific version type. Valid values are: - external : The version number must be greater than the current version. - external_gte : The version number must be greater than or equal to the current version. - force : The version number is forced to be the given value. - internal : The version number is managed internally by OpenSearch. |
Request body fields
The following table lists the fields that can be specified in the request body.
Field | Data type | Description |
doc | Object | A document to analyze. If provided, the API does not retrieve an existing document from the index but uses the provided content. |
fields | Array of strings | A list of field names for which to return term vectors. |
offsets | Boolean | If true , the response includes character offsets for each term. (Default: true ) |
payloads | Boolean | If true , the response includes payloads for each term. (Default: true ) |
positions | Boolean | If true , the response includes token positions. (Default: true ) |
field_statistics | Boolean | If true , the response includes statistics such as document count, sum of document frequencies, and sum of total term frequencies. (Default: true ) |
term_statistics | Boolean | If true , the response includes term frequency and document frequency. (Default: false ) |
routing | String | A custom routing value used to identify the shard. Required if custom routing was used during indexing. |
version | Integer | The specific version of the document to retrieve. |
version_type | String | The type of versioning to use. Valid values: internal , external , external_gte , force . |
filter | Object | Allows filtering of tokens returned in the response (for example, by frequency or position). See Filtering terms for available options. |
per_field_analyzer | Object | Specifies a custom analyzer to use per field. Format: { "field_name": "analyzer_name" } . |
preference | String | Specifies shard or node routing preferences. See preference query parameter. |
Filtering terms
The filter
object in the request body allows you to filter the tokens to include in the term vector response. The filter
object supports the following fields.
Field | Data type | Description |
max_num_terms | Integer | The maximum number of terms to return. |
min_term_freq | Integer | The minimum term frequency in the document required for a term to be included. |
max_term_freq | Integer | The maximum term frequency in the document required for a term to be included. |
min_doc_freq | Integer | The minimum document frequency across the index required for a term to be included. |
max_doc_freq | Integer | The maximum document frequency across the index required for a term to be included. |
min_word_length | Integer | The minimum length of the term to be included. |
max_word_length | Integer | The maximum length of the term to be included. |
Example
Create an index:
PUT /my-index
{
"mappings": {
"properties": {
"text": {
"type": "text",
"term_vector": "with_positions_offsets_payloads"
}
}
}
}
Index the document:
POST /my-index/_doc/1
{
"text": "OpenSearch is a search engine."
}
Example request
Retrieve the term vectors:
GET /my-index/_termvectors/1
{
"fields": ["text"],
"term_statistics": true
}
Alternatively, you can provide fields
and term_statistics
as query parameters:
GET /my-index/_termvectors/1?fields=text&term_statistics=true
Example response
The response displays term vector information:
{
"_index": "my-index",
"_id": "1",
"_version": 1,
"found": true,
"took": 1,
"term_vectors": {
"text": {
"field_statistics": {
"sum_doc_freq": 5,
"doc_count": 1,
"sum_ttf": 5
},
"terms": {
"a": {
"doc_freq": 1,
"ttf": 1,
"term_freq": 1,
"tokens": [
{
"position": 2,
"start_offset": 14,
"end_offset": 15
}
]
},
"engine": {
"doc_freq": 1,
"ttf": 1,
"term_freq": 1,
"tokens": [
{
"position": 4,
"start_offset": 23,
"end_offset": 29
}
]
},
"is": {
"doc_freq": 1,
"ttf": 1,
"term_freq": 1,
"tokens": [
{
"position": 1,
"start_offset": 11,
"end_offset": 13
}
]
},
"opensearch": {
"doc_freq": 1,
"ttf": 1,
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 10
}
]
},
"search": {
"doc_freq": 1,
"ttf": 1,
"term_freq": 1,
"tokens": [
{
"position": 3,
"start_offset": 16,
"end_offset": 22
}
]
}
}
}
}
}
Response body fields
The following table lists all response body fields.
Field | Data type | Description |
term_vectors | Object | Contains term vector data for each specified field. |
term_vectors.text | Object | Contains term vector details for the text field. |
term_vectors.text.field_statistics | Object | Contains statistics for the entire field. Present only if field_statistics is true . |
term_vectors.text.field_statistics.doc_count | Integer | The number of documents that contain at least one term in the specified field. |
term_vectors.text.field_statistics.sum_doc_freq | Integer | The sum of document frequencies for all terms in the field. |
term_vectors.text.field_statistics.sum_ttf | Integer | The sum of total term frequencies (including repetitions) for all terms in the field. |
term_vectors.text.terms | Object | A map, in which each key is a term and each value contains details about that term. |
term_vectors.text.terms.<term>.term_freq | Integer | The number of times the term appears in the document. |
term_vectors.text.terms.<term>.doc_freq | Integer | The number of documents containing the term. Present only if term_statistics is true . |
term_vectors.text.terms.<term>.ttf | Integer | The total term frequency across all documents. Present only if term_statistics is true . |
term_vectors.text.terms.<term>.tokens | Array | A list of token objects providing information about individual term instances. |
term_vectors.text.terms.<term>.tokens[].position | Integer | The position of the token within the text. Present only if positions is true . |
term_vectors.text.terms.<term>.tokens[].start_offset | Integer | The start character offset of the token. Present only if offsets is true . |
term_vectors.text.terms.<term>.tokens[].end_offset | Integer | The end character offset of the token. Present only if offsets is true . |
term_vectors.text.terms.<term>.tokens[].payload | String (Base64) | Optional payload data associated with the token. Present only if payloads is true and available. |