Stop token filter
The stop token filter is used to remove common words (also known as stopwords) from a token stream during analysis. Stopwords are typically articles and prepositions, such as a or for. These words are not significantly meaningful in search queries and are often excluded to improve search efficiency and relevance.
The default list of English stopwords includes the following words: a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, and with.
Parameters
The stop token filter can be configured with the following parameters.
| Parameter | Required/Optional | Data type | Description |
|---|---|---|---|
stopwords | Optional | String | Specifies either a custom array of stopwords or a predefined stopword set for a language. Default is _english_. |
stopwords_path | Optional | String | Specifies the file path (absolute or relative to the config directory) of the file containing custom stopwords. |
ignore_case | Optional | Boolean | If true, stopwords will be matched regardless of their case. Default is false. |
remove_trailing | Optional | Boolean | If true, trailing stopwords will be removed during analysis. Default is true. |
Example
The following example request creates a new index named my-stopword-index and configures an analyzer with a stop filter that uses the predefined stopword list for the English language:
PUT /my-stopword-index
{
"settings": {
"analysis": {
"filter": {
"my_stop_filter": {
"type": "stop",
"stopwords": "_english_"
}
},
"analyzer": {
"my_stop_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"my_stop_filter"
]
}
}
}
}
}
Generated tokens
Use the following request to examine the tokens generated using the analyzer:
GET /my-stopword-index/_analyze
{
"analyzer": "my_stop_analyzer",
"text": "A quick dog jumps over the turtle"
}
The response contains the generated tokens:
{
"tokens": [
{
"token": "quick",
"start_offset": 2,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "dog",
"start_offset": 8,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "jumps",
"start_offset": 12,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "over",
"start_offset": 18,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "turtle",
"start_offset": 27,
"end_offset": 33,
"type": "<ALPHANUM>",
"position": 6
}
]
}
Predefined stopword sets by language
The following is a list of all available predefined stopword sets by language:
_arabic__armenian__basque__bengali__brazilian_(Brazilian Portuguese)_bulgarian__catalan__cjk_(Chinese, Japanese, and Korean)_czech__danish__dutch__english__estonian__finnish__french__galician__german__greek__hindi__hungarian__indonesian__irish__italian__latvian__lithuanian__norwegian__persian__portuguese__romanian__russian__sorani__spanish__swedish__thai__turkish_