Keep words token filter
The keep_words token filter is designed to keep only certain words during the analysis process. This filter is useful if you have a large body of text but are only interested in certain keywords or terms.
Parameters
The keep_words token filter can be configured with the following parameters.
| Parameter | Required/Optional | Data type | Description | 
|---|---|---|---|
| keep_words | Required if keep_words_pathis not configured | List of strings | The list of words to keep. | 
| keep_words_path | Required if keep_wordsis not configured | String | The path to the file containing the list of words to keep. | 
| keep_words_case | Optional | Boolean | Whether to lowercase all words during comparison. Default is false. | 
Example
The following example request creates a new index named my_index and configures an analyzer with a keep_words filter:
PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_keep_word": {
          "tokenizer": "standard",
          "filter": [ "keep_words_filter" ]
        }
      },
      "filter": {
        "keep_words_filter": {
          "type": "keep",
          "keep_words": ["example", "world", "opensearch"],
          "keep_words_case": true
        }
      }
    }
  }
}
Generated tokens
Use the following request to examine the tokens generated using the analyzer:
GET /my_index/_analyze
{
  "analyzer": "custom_keep_word",
  "text": "Hello, world! This is an OpenSearch example."
}
The response contains the generated tokens:
{
  "tokens": [
    {
      "token": "world",
      "start_offset": 7,
      "end_offset": 12,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "OpenSearch",
      "start_offset": 25,
      "end_offset": 35,
      "type": "<ALPHANUM>",
      "position": 5
    },
    {
      "token": "example",
      "start_offset": 36,
      "end_offset": 43,
      "type": "<ALPHANUM>",
      "position": 6
    }
  ]
}