You're viewing version 3.1 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.
Stop token filter
The stop token filter is used to remove common words (also known as stopwords) from a token stream during analysis. Stopwords are typically articles and prepositions, such as a or for. These words are not significantly meaningful in search queries and are often excluded to improve search efficiency and relevance.
The default list of English stopwords includes the following words: a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, and with.
Parameters
The stop token filter can be configured with the following parameters.
| Parameter | Required/Optional | Data type | Description | 
|---|---|---|---|
stopwords |  Optional | String | Specifies either a custom array of stopwords or a predefined stopword set for a language. Default is _english_. |  
stopwords_path |  Optional | String | Specifies the file path (absolute or relative to the config directory) of the file containing custom stopwords. | 
ignore_case |  Optional | Boolean | If true, stopwords will be matched regardless of their case. Default is false. |  
remove_trailing |  Optional | Boolean | If true, trailing stopwords will be removed during analysis. Default is true. |  
Example
The following example request creates a new index named my-stopword-index and configures an analyzer with a stop filter that uses the predefined stopword list for the English language:
PUT /my-stopword-index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_stop_filter": {
          "type": "stop",
          "stopwords": "_english_"
        }
      },
      "analyzer": {
        "my_stop_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_stop_filter"
          ]
        }
      }
    }
  }
}
Generated tokens
Use the following request to examine the tokens generated using the analyzer:
GET /my-stopword-index/_analyze
{
  "analyzer": "my_stop_analyzer",
  "text": "A quick dog jumps over the turtle"
}
The response contains the generated tokens:
{
  "tokens": [
    {
      "token": "quick",
      "start_offset": 2,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "dog",
      "start_offset": 8,
      "end_offset": 11,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "jumps",
      "start_offset": 12,
      "end_offset": 17,
      "type": "<ALPHANUM>",
      "position": 3
    },
    {
      "token": "over",
      "start_offset": 18,
      "end_offset": 22,
      "type": "<ALPHANUM>",
      "position": 4
    },
    {
      "token": "turtle",
      "start_offset": 27,
      "end_offset": 33,
      "type": "<ALPHANUM>",
      "position": 6
    }
  ]
}
Predefined stopword sets by language
The following is a list of all available predefined stopword sets by language:
_arabic__armenian__basque__bengali__brazilian_(Brazilian Portuguese)_bulgarian__catalan__cjk_(Chinese, Japanese, and Korean)_czech__danish__dutch__english__estonian__finnish__french__galician__german__greek__hindi__hungarian__indonesian__irish__italian__latvian__lithuanian__norwegian__persian__portuguese__romanian__russian__sorani__spanish__swedish__thai__turkish_