Multi term vectors
The _mtermvectors
API retrieves term vector information for multiple documents in one request. Term vectors provide detailed information about the terms (words) in a document, including term frequency, positions, offsets, and payloads. This can be useful for applications such as relevance scoring, highlighting, or similarity calculations. For more information, see Term vector parameter.
Endpoints
GET /_mtermvectors
POST /_mtermvectors
GET /{index}/_mtermvectors
POST /{index}/_mtermvectors
Path parameters
The following table lists the available path parameters. All path parameters are optional.
Parameter | Data type | Description |
---|---|---|
index | String | The name of the index that contains the document. |
Query parameters
The following table lists the available query parameters. All query parameters are optional.
Parameter | Data type | Description |
---|---|---|
field_statistics | Boolean | If true , the response includes the document count, sum of document frequencies, and sum of total term frequencies. (Default: true ) |
fields | List or String | A comma-separated list or a wildcard expression specifying the fields to include in the statistics. Used as the default list unless a specific field list is provided in the completion_fields or fielddata_fields parameters. |
ids | List | A comma-separated list of documents IDs. You must provide either the docs field in the request body or specify ids as a query parameter or in the request body. |
offsets | Boolean | If true , the response includes term offsets. (Default: true ) |
payloads | Boolean | If true , the response includes term payloads. (Default: true ) |
positions | Boolean | If true , the response includes term positions. (Default: true ) |
preference | String | Specifies the node or shard on which the operation should be performed. See preference query parameter for a list of available options. By default the requests are routed randomly to available shard copies (primary or replica), with no guarantee of consistency across repeated queries. |
realtime | Boolean | If true , the request is real time as opposed to near real time. (Default: true ) |
routing | List or String | A custom value used to route operations to a specific shard. |
term_statistics | Boolean | If true , the response includes term frequency and document frequency. (Default: false ) |
version | Integer | If true , returns the document version as part of a hit. |
version_type | String | The specific version type. Valid values are: - external : The version number must be greater than the current version. - external_gte : The version number must be greater than or equal to the current version. - force : The version number is forced to be the given value. - internal : The version number is managed internally by OpenSearch. |
Request body fields
The following table lists the fields that can be specified in the request body.
Field | Data type | Description |
docs | Array | An array of document specifications. |
ids | Array of strings | A list of document IDs to retrieve. Use only when all documents share the same index specified in the request path or query. |
fields | Array of strings | A list of field names for which to return term vectors. |
offsets | Boolean | If true , the response includes character offsets for each term. (Default: true ) |
payloads | Boolean | If true , the response includes payloads for each term. (Default: true ) |
positions | Boolean | If true , the response includes token positions. (Default: true ) |
field_statistics | Boolean | If true , the response includes statistics such as document count, sum of document frequencies, and sum of total term frequencies. (Default: true ) |
term_statistics | Boolean | If true , the response includes term frequency and document frequency. (Default: false ) |
routing | String | A custom routing value used to identify the shard. Required if custom routing was used during indexing. |
version | Integer | The specific version of the document to retrieve. |
version_type | String | The type of versioning to use. Valid values: internal , external , external_gte . |
filter | Object | Filters tokens returned in the response (for example, by frequency or position). For supported fields, see Filtering terms. |
per_field_analyzer | Object | Specifies a custom analyzer to use per field. Format: { "field_name": "analyzer_name" } . |
Filtering terms
The filter
object in the request body allows you to filter the tokens to include in the term vector response. The filter
object supports the following fields.
Field | Data type | Description |
max_num_terms | Integer | The maximum number of terms to return. |
min_term_freq | Integer | The minimum term frequency in the document required for a term to be included. |
max_term_freq | Integer | The maximum term frequency in the document required for a term to be included. |
min_doc_freq | Integer | The minimum document frequency across the index required for a term to be included. |
max_doc_freq | Integer | The maximum document frequency across the index required for a term to be included. |
min_word_length | Integer | The minimum length of the term to be included. |
max_word_length | Integer | The maximum length of the term to be included. |
Example
Create an index with term vectors enabled:
PUT /my-index
{
"mappings": {
"properties": {
"text": {
"type": "text",
"term_vector": "with_positions_offsets_payloads"
}
}
}
}
Index the first document:
POST /my-index/_doc/1
{
"text": "OpenSearch is a search engine."
}
Index the second document:
POST /my-index/_doc/2
{
"text": "OpenSearch provides powerful features."
}
Example request
Get term vectors for multiple documents:
POST /_mtermvectors
{
"docs": [
{
"_index": "my-index",
"_id": "1",
"fields": ["text"]
},
{
"_index": "my-index",
"_id": "2",
"fields": ["text"]
}
]
}
Alternatively, you can specify both ids
and fields
as query parameters:
GET /my-index/_mtermvectors?ids=1,2&fields=text
You can also provide document IDs in the ids
array instead of specifying docs
:
GET /my-index/_mtermvectors?fields=text
{
"ids": [
"1", "2"
]
}
Example response
The response contains term vector information for the two documents:
{
"docs": [
{
"_index": "my-index",
"_id": "1",
"_version": 1,
"found": true,
"took": 10,
"term_vectors": {
"text": {
"field_statistics": {
"sum_doc_freq": 9,
"doc_count": 2,
"sum_ttf": 9
},
"terms": {
"a": {
"term_freq": 1,
"tokens": [
{
"position": 2,
"start_offset": 14,
"end_offset": 15
}
]
},
"engine": {
"term_freq": 1,
"tokens": [
{
"position": 4,
"start_offset": 23,
"end_offset": 29
}
]
},
"is": {
"term_freq": 1,
"tokens": [
{
"position": 1,
"start_offset": 11,
"end_offset": 13
}
]
},
"opensearch": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 10
}
]
},
"search": {
"term_freq": 1,
"tokens": [
{
"position": 3,
"start_offset": 16,
"end_offset": 22
}
]
}
}
}
}
},
{
"_index": "my-index",
"_id": "2",
"_version": 1,
"found": true,
"took": 0,
"term_vectors": {
"text": {
"field_statistics": {
"sum_doc_freq": 9,
"doc_count": 2,
"sum_ttf": 9
},
"terms": {
"features": {
"term_freq": 1,
"tokens": [
{
"position": 3,
"start_offset": 29,
"end_offset": 37
}
]
},
"opensearch": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 10
}
]
},
"powerful": {
"term_freq": 1,
"tokens": [
{
"position": 2,
"start_offset": 20,
"end_offset": 28
}
]
},
"provides": {
"term_freq": 1,
"tokens": [
{
"position": 1,
"start_offset": 11,
"end_offset": 19
}
]
}
}
}
}
}
]
}
Response body fields
The following table lists all response body fields.
Field | Data type | Description |
---|---|---|
docs | Array | A list of requested documents containing term vectors. |
Each element of the docs
array contains the following fields.
Field | Data type | Description |
---|---|---|
term_vectors | Object | Contains term vector data for each field. |
term_vectors.<field>.field_statistics | Object | Contains statistics about the field. |
term_vectors.<field>.field_statistics.doc_count | Integer | The number of documents that contain at least one term in the specified field. |
term_vectors.<field>.field_statistics.sum_doc_freq | Integer | The sum of document frequencies for all terms in the field. |
term_vectors.<field>.field_statistics.sum_ttf | Integer | The sum of total term frequencies for all terms in the field. |
term_vectors.<field>.terms | Object | A map of terms in the field, in which each term includes its frequency (term_freq ) and associated token information. |
term_vectors.<field>.terms.<term>.tokens | Array | An array of token objects for each term, including the token’s position in the text and its character offsets (start_offset and end_offset ). |