ICU collation keyword field type
The icu_collation_keyword field type stores terms as binary encoded collation keys, enabling language-specific sorting and range queries. Unlike standard string sorting, which uses byte-order comparison, this field type applies collation rules that respect linguistic conventions for a specific language or locale.
This field type is particularly useful when you need to sort documents according to language-specific alphabetical order, handle accented characters correctly, or implement culturally appropriate string comparisons.
Installation
The icu_collation_keyword field type requires the analysis-icu plugin. For installation instructions, see ICU analyzer.
How it works
The icu_collation_keyword field encodes terms directly as binary collation keys in doc values and creates a single indexed token (similar to the standard keyword field). This approach provides:
- Language-aware sorting: Applies collation rules specific to a language or locale
- Efficient storage: Stores binary collation keys rather than full strings
- Range query support: Enables range queries that respect linguistic ordering
By default, the field uses DUCET (Default Unicode Collation Element Table) collation, which provides a language-neutral best-effort sort order.
Parameters
The following table lists the parameters accepted by the icu_collation_keyword field type.
| Parameter | Data type | Description |
|---|---|---|
language | String | The language code (for example, de for German, fr for French). Optional. |
country | String | The country code (for example, DE for Germany, FR for France). Optional. |
variant | String | A variant string for additional collation options (for example, @collation=phonebook for German phonebook order). Optional. |
strength | String | The collation strength level. Valid values are primary, secondary, tertiary, quaternary, and identical. Default is tertiary. Optional. |
decomposition | String | How to handle character normalization. Valid values are no and canonical. Default is no. Optional. |
alternate | String | How to handle whitespace and punctuation. Valid values are shifted and non-ignorable. Optional. |
case_level | Boolean | Whether to consider case differences when strength is primary. Default is false. Optional. |
case_first | String | Whether uppercase or lowercase sorts first. Valid values are lower and upper. Optional. |
numeric | Boolean | Whether to sort numeric substrings by numeric value. For example, item-9 sorts before item-21. Default is false. Optional. |
variable_top | String | Specifies which characters are considered variable for the alternate option. Optional. |
hiragana_quaternary_mode | Boolean | Whether to distinguish between Katakana and Hiragana at quaternary strength. Optional. |
doc_values | Boolean | Whether the field should be stored on disk for sorting and aggregations. Default is true. Optional. |
index | Boolean | Whether the field should be searchable. Default is true. Optional. |
null_value | String | A string value to substitute for explicit null values. Default is null (field treated as missing). Optional. |
store | Boolean | Whether to store the field value separately from _source. Default is false. Optional. |
fields | Object | Multi-field mappings for indexing the same value in different ways. Optional. |
Example: German phonebook sorting
The following example creates an index with a field that sorts German names using phonebook ordering:
PUT /german-names
{
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"sort": {
"type": "icu_collation_keyword",
"language": "de",
"country": "DE",
"variant": "@collation=phonebook"
}
}
}
}
}
}
Index some German names:
POST /german-names/_bulk
{"index":{"_id":"1"}}
{"name":"Müller"}
{"index":{"_id":"2"}}
{"name":"Möller"}
{"index":{"_id":"3"}}
{"name":"Meyer"}
{"index":{"_id":"4"}}
{"name":"Schneider"}
Search and sort using the collation field:
GET /german-names/_search
{
"query": {
"match_all": {}
},
"sort": "name.sort"
}
The results are sorted according to German phonebook conventions, where ö and ü are treated as distinct characters in the German alphabet:
{
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "german-names",
"_id": "3",
"_score": null,
"_source": {
"name": "Meyer"
}
},
{
"_index": "german-names",
"_id": "2",
"_score": null,
"_source": {
"name": "Möller"
}
},
{
"_index": "german-names",
"_id": "1",
"_score": null,
"_source": {
"name": "Müller"
}
},
{
"_index": "german-names",
"_id": "4",
"_score": null,
"_source": {
"name": "Schneider"
}
}
]
}
}
Example: French accented character sorting
The following example demonstrates French collation, which treats accented characters according to French linguistic rules:
PUT /french-words
{
"mappings": {
"properties": {
"word": {
"type": "text",
"fields": {
"sort": {
"type": "icu_collation_keyword",
"language": "fr",
"country": "FR",
"strength": "primary"
}
}
}
}
}
}
Index French words with accents:
POST /french-words/_bulk
{"index":{"_id":"1"}}
{"word":"cote"}
{"index":{"_id":"2"}}
{"word":"côte"}
{"index":{"_id":"3"}}
{"word":"coté"}
{"index":{"_id":"4"}}
{"word":"côté"}
Query with sorting:
GET /french-words/_search
{
"query": {
"match_all": {}
},
"sort": "word.sort"
}
The results follow French alphabetical conventions.
Collation strength levels
The strength parameter determines how strictly the collation compares strings:
primary: Compares base characters only, ignoring accents and case. For example,a,A,á, andÁare considered equal.secondary: Compares base characters and accents, but ignores case. For example,aandáare different, butaandAare equal.tertiary(default): Compares base characters, accents, and case. For example,a,A, andáare all different.quaternary: Adds punctuation and whitespace comparison whenalternateis set toshifted.identical: Performs character-by-character binary comparison.
Performance considerations
The icu_collation_keyword field type uses more disk space than standard keyword fields because it stores binary collation keys. However, this approach provides faster sorting and range queries compared to applying collation at query time.
For optimal performance:
- Use
icu_collation_keywordas a multi-field ontextfields rather than as the primary field type - Set
index: falseif you only need sorting and don’t require range queries on the collation field - Choose an appropriate
strengthlevel—lower strengths produce smaller collation keys