You're viewing version 3.1 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.

Standard analyzer

The standard analyzer is the built-in default analyzer used for general-purpose full-text search in OpenSearch. It is designed to provide consistent, language-agnostic text processing by efficiently breaking down text into searchable terms.

The standard analyzer performs the following operations:

Tokenization: Uses the standard tokenizer, which splits text into words based on Unicode text segmentation rules, handling spaces, punctuation, and common delimiters.
Lowercasing: Applies the lowercase token filter to convert all tokens to lowercase, ensuring consistent matching regardless of input case.

This combination makes the standard analyzer ideal for indexing a wide range of natural language content without needing language-specific customizations.

Example: Creating an index with the standard analyzer

You can assign the standard analyzer to a text field when creating an index:

PUT /my_standard_index
{
  "mappings": {
    "properties": {
      "my_field": {
        "type": "text",
        "analyzer": "standard"
      }
    }
  }
}

Parameters

The standard analyzer supports the following optional parameters.

Parameter	Data type	Default	Description
`max_token_length`	Integer	`255`	The maximum length that a token can be before it is split.
`stopwords`	String or list of strings	None	A list of stopwords or a predefined stopword set for a language to remove during analysis. For example, `_english_`.
`stopwords_path`	String	None	The path to a file containing stopwords to be used during analysis.

Only use one of the parameters stopwords or stopwords_path. If both are used, no error is returned but only the stopwords parameter is applied.

Example: Analyzer with parameters

The following example creates a products index and configures the max_token_length and stopwords parameters:

PUT /animals
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_manual_stopwords_analyzer": {
          "type": "standard",
          "max_token_length": 10,
          "stopwords": [
            "the", "is", "and", "but", "an", "a", "it"
          ]
        }
      }
    }
  }
}

Use the following _analyze API request to see how the my_manual_stopwords_analyzer processes text:

POST /animals/_analyze
{
  "analyzer": "my_manual_stopwords_analyzer",
  "text": "The Turtle is Large but it is Slow"
}

The returned tokens:

Have been split on spaces.
Have been lowercased.
Have had stopwords removed.

{
  "tokens": [
    {
      "token": "turtle",
      "start_offset": 4,
      "end_offset": 10,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "large",
      "start_offset": 14,
      "end_offset": 19,
      "type": "<ALPHANUM>",
      "position": 3
    },
    {
      "token": "slow",
      "start_offset": 30,
      "end_offset": 34,
      "type": "<ALPHANUM>",
      "position": 7
    }
  ]
}

Example: Creating an index with the standard analyzer
Parameters
Example: Analyzer with parameters

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

Standard analyzer

Example: Creating an index with the standard analyzer

Parameters

Example: Analyzer with parameters

OpenSearch Links

Get Involved

Resources

Contact Us