You're viewing version 2.19 of the OpenSearch documentation. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.
Character filters
Character filters process text before tokenization to prepare it for further analysis.
Unlike token filters, which operate on tokens (words or terms), character filters process the raw input text before tokenization. They are especially useful for cleaning or transforming structured text containing unwanted characters, such as HTML tags or special symbols. Character filters help to strip or replace these elements so that text is properly formatted for analysis.
Use cases for character filters include:
- HTML stripping: The html_stripcharacter filter removes HTML tags from content so that only the plain text is indexed.
- Pattern replacement: The pattern_replacecharacter filter replaces or removes unwanted characters or patterns in text, for example, converting hyphens to spaces.
- Custom mappings: The mappingcharacter filter substitutes specific characters or sequences with other values, for example, to convert currency symbols into their textual equivalents.