Stop token filter
The stop
token filter is used to remove common words (also known as stopwords) from a token stream during analysis. Stopwords are typically articles and prepositions, such as a
or for
. These words are not significantly meaningful in search queries and are often excluded to improve search efficiency and relevance.
The default list of English stopwords includes the following words: a
, an
, and
, are
, as
, at
, be
, but
, by
, for
, if
, in
, into
, is
, it
, no
, not
, of
, on
, or
, such
, that
, the
, their
, then
, there
, these
, they
, this
, to
, was
, will
, and with
.
Parameters
The stop
token filter can be configured with the following parameters.
Parameter | Required/Optional | Data type | Description |
---|---|---|---|
stopwords | Optional | String | Specifies either a custom array of stopwords or a predefined stopword set for a language. Default is _english_ . |
stopwords_path | Optional | String | Specifies the file path (absolute or relative to the config directory) of the file containing custom stopwords. |
ignore_case | Optional | Boolean | If true , stopwords will be matched regardless of their case. Default is false . |
remove_trailing | Optional | Boolean | If true , trailing stopwords will be removed during analysis. Default is true . |
Example
The following example request creates a new index named my-stopword-index
and configures an analyzer with a stop
filter that uses the predefined stopword list for the English language:
PUT /my-stopword-index
{
"settings": {
"analysis": {
"filter": {
"my_stop_filter": {
"type": "stop",
"stopwords": "_english_"
}
},
"analyzer": {
"my_stop_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"my_stop_filter"
]
}
}
}
}
}
Generated tokens
Use the following request to examine the tokens generated using the analyzer:
GET /my-stopword-index/_analyze
{
"analyzer": "my_stop_analyzer",
"text": "A quick dog jumps over the turtle"
}
The response contains the generated tokens:
{
"tokens": [
{
"token": "quick",
"start_offset": 2,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "dog",
"start_offset": 8,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "jumps",
"start_offset": 12,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "over",
"start_offset": 18,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "turtle",
"start_offset": 27,
"end_offset": 33,
"type": "<ALPHANUM>",
"position": 6
}
]
}
Predefined stopword sets by language
The following is a list of all available predefined stopword sets by language:
_arabic_
_armenian_
_basque_
_bengali_
_brazilian_
(Brazilian Portuguese)_bulgarian_
_catalan_
_cjk_
(Chinese, Japanese, and Korean)_czech_
_danish_
_dutch_
_english_
_estonian_
_finnish_
_french_
_galician_
_german_
_greek_
_hindi_
_hungarian_
_indonesian_
_irish_
_italian_
_latvian_
_lithuanian_
_norwegian_
_persian_
_portuguese_
_romanian_
_russian_
_sorani_
_spanish_
_swedish_
_thai_
_turkish_