Polish analyzer
The Polish language analyzer (polish) provides analysis for Polish text. This analyzer is part of the analysis-stempel plugin, which must be installed before use.
Installing the plugin
Before you can use the Polish analyzer, you must install the analysis-stempel plugin by running the following command:
./bin/opensearch-plugin install analysis-stempel
For more information, see Additional plugins: Complete list of available OpenSearch plugins.
Using the Polish analyzer
To use the Polish analyzer when you map an index, specify the polish value in the analyzer field:
PUT my-index
{
"mappings": {
"properties": {
"content": {
"type": "text",
"analyzer": "polish"
}
}
}
}
Configuring a custom Polish analyzer
You can configure a custom Polish analyzer by creating a custom analyzer that uses the Polish stemmer token filter. The default Polish analyzer applies the following analysis chain:
- Tokenizer:
standard - Token filters:
lowercasepolish_stop(removes Polish stop words)polish_stem(applies Polish stemming)
Example: Custom Polish analyzer
PUT my-polish-index
{
"settings": {
"analysis": {
"analyzer": {
"custom_polish": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"polish_stop",
"polish_stem"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "custom_polish"
},
"content": {
"type": "text",
"analyzer": "polish"
}
}
}
}
Polish token filters
The analysis-stempel plugin provides the following token filters for Polish language processing.
polish_stop token filter
Removes common Polish stop words from the token stream.
polish_stem token filter
Applies Polish-specific stemming rules to reduce words to their root forms using the Stempel stemming algorithm.
Generated tokens
Use the following request to examine the tokens generated using the analyzer:
POST _analyze
{
"analyzer": "polish",
"text": "Jestem programistą w Polsce i pracuję z OpenSearch"
}
The response contains the generated tokens:
{
"tokens": [
{"token": "jest", "start_offset": 0, "end_offset": 6, "type": "<ALPHANUM>", "position": 0},
{"token": "prograć", "start_offset": 7, "end_offset": 18, "type": "<ALPHANUM>", "position": 1},
{"token": "polsce", "start_offset": 21, "end_offset": 27, "type": "<ALPHANUM>", "position": 3},
{"token": "pracować", "start_offset": 30, "end_offset": 37, "type": "<ALPHANUM>", "position": 5},
{"token": "opensearch", "start_offset": 40, "end_offset": 50, "type": "<ALPHANUM>", "position": 7}
]
}