Stop token filter

The stop token filter is used to remove common words (also known as stopwords) from a token stream during analysis. Stopwords are typically articles and prepositions, such as a or for. These words are not significantly meaningful in search queries and are often excluded to improve search efficiency and relevance.

The default list of English stopwords includes the following words: a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, and with.

Parameters

The stop token filter can be configured with the following parameters.

Parameter	Required/Optional	Data type	Description
`stopwords`	Optional	String	Specifies either a custom array of stopwords or a predefined stopword set for a language. Default is `_english_`.
`stopwords_path`	Optional	String	Specifies the file path (absolute or relative to the config directory) of the file containing custom stopwords.
`ignore_case`	Optional	Boolean	If `true`, stopwords will be matched regardless of their case. Default is `false`.
`remove_trailing`	Optional	Boolean	If `true`, trailing stopwords will be removed during analysis. Default is `true`.

Example

The following example request creates a new index named my-stopword-index and configures an analyzer with a stop filter that uses the predefined stopword list for the English language:

PUT /my-stopword-index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_stop_filter": {
          "type": "stop",
          "stopwords": "_english_"
        }
      },
      "analyzer": {
        "my_stop_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_stop_filter"
          ]
        }
      }
    }
  }
}

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

GET /my-stopword-index/_analyze
{
  "analyzer": "my_stop_analyzer",
  "text": "A quick dog jumps over the turtle"
}

The response contains the generated tokens:

{
  "tokens": [
    {
      "token": "quick",
      "start_offset": 2,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "dog",
      "start_offset": 8,
      "end_offset": 11,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "jumps",
      "start_offset": 12,
      "end_offset": 17,
      "type": "<ALPHANUM>",
      "position": 3
    },
    {
      "token": "over",
      "start_offset": 18,
      "end_offset": 22,
      "type": "<ALPHANUM>",
      "position": 4
    },
    {
      "token": "turtle",
      "start_offset": 27,
      "end_offset": 33,
      "type": "<ALPHANUM>",
      "position": 6
    }
  ]
}

Predefined stopword sets by language

The following is a list of all available predefined stopword sets by language:

Parameters
Example
Generated tokens
Predefined stopword sets by language

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

Stop token filter

Parameters

Example

Generated tokens

Predefined stopword sets by language

OpenSearch Links

Get Involved

Resources

Contact Us