Analyzers
The following sections list all analyzers that OpenSearch supports.
Built-in analyzers
The following table lists the built-in analyzers that OpenSearch provides. The last column of the table contains the result of applying the analyzer to the string It’s fun to contribute a brand-new PR or 2 to OpenSearch!.
| Analyzer | Analysis performed | Analyzer output |
|---|---|---|
| Standard (default) | - Parses strings into tokens at word boundaries - Removes most punctuation - Converts tokens to lowercase | [it’s, fun, to, contribute, a,brand, new, pr, or, 2, to, opensearch] |
| Simple | - Parses strings into tokens on any non-letter character - Removes non-letter characters - Converts tokens to lowercase | [it, s, fun, to, contribute, a,brand, new, pr, or, to, opensearch] |
| Whitespace | - Parses strings into tokens on white space | [It’s, fun, to, contribute, a,brand-new, PR, or, 2, to, OpenSearch!] |
| Stop | - Parses strings into tokens on any non-letter character - Removes non-letter characters - Removes stop words - Converts tokens to lowercase | [s, fun, contribute, brand, new, pr, opensearch] |
| Keyword (no-op) | - Outputs the entire string unchanged | [It’s fun to contribute a brand-new PR or 2 to OpenSearch!] |
| Pattern | - Parses strings into tokens using regular expressions - Supports converting strings to lowercase - Supports removing stop words | [it, s, fun, to, contribute, a,brand, new, pr, or, 2, to, opensearch] |
| Language | Performs analysis specific to a certain language (for example, english). | [fun, contribut, brand, new, pr, 2, opensearch] |
| Fingerprint | - Parses strings on any non-letter character - Normalizes characters by converting them to ASCII - Converts tokens to lowercase - Sorts, deduplicates, and concatenates tokens into a single token - Supports removing stop words | [2 a brand contribute fun it's new opensearch or pr to] Note that the apostrophe was converted to its ASCII counterpart. |
| DL model | Use ML model tokenization rules for neural sparse search. | Model-based tokens |
Language analyzers
OpenSearch supports multiple language analyzers. For more information, see Language analyzers.
Additional analyzers
The following table lists the additional analyzers that OpenSearch supports.
| Analyzer | Analysis performed |
|---|---|
phone | An index analyzer for parsing phone numbers. |
phone-search | A search analyzer for parsing phone numbers. |