Link Search Menu Expand Document Documentation Menu

Analyzer

The analyzer mapping parameter specifies the analyzer to use for text analysis when indexing or searching a text field. Unless overridden by the search_analyzer mapping parameter, this analyzer handles both index-time and search-time analysis. For more information about analyzers, see Text analysis.

Only text fields support the analyzer mapping parameter.

The analyzer parameter cannot be updated on existing fields using the Update Mapping API. To change the analyzer for an existing field, you must reindex your data.

We recommend testing analyzers before deploying them to production environments.

Search quote analyzer

The search_quote_analyzer parameter allows you to specify a different analyzer specifically for phrase queries. This proves especially valuable when you need to handle stop words differently for phrase searches versus regular term searches.

For effective phrase query handling with stop words, configure three analyzer settings:

  1. An analyzer for indexing that preserves all terms, including stop words.
  2. A search_analyzer for regular queries that filters out stop words.
  3. A search_quote_analyzer for phrase queries that retains stop words.

Example

The following example demonstrates how to use the search_quote_analyzer to handle stop words differently in phrase queries versus term queries.

First, create an index with all three analyzer types. The index_analyzer preserves all terms during indexing, including stop words like “the” and “a”. The search_analyzer removes stop words from regular term queries. The search_quote_analyzer uses the same analyzer that was used during document indexing, ensuring exact phrase matching works correctly:

PUT /product_catalog
{
  "settings": {
    "analysis": {
      "analyzer": {
        "index_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase"
          ]
        },
        "search_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "english_stop"
          ]
        }
      },
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": "_english_"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "product_name": {
        "type": "text",
        "analyzer": "index_analyzer",
        "search_analyzer": "search_analyzer",
        "search_quote_analyzer": "index_analyzer"
      }
    }
  }
}

Next, add sample documents to the index:

PUT /product_catalog/_doc/1
{
  "product_name": "The Smart Watch Pro"
}

PUT /product_catalog/_doc/2
{
  "product_name": "A Smart Watch Ultra"
}

Search your index for the phrase “the smart watch” (in quotation marks):

GET /product_catalog/_search
{
  "query": {
    "query_string": {
      "query": "\"the smart watch\"",
      "default_field": "product_name"
    }
  }
}

Because the query is enclosed in quotation marks, it becomes a phrase query, which is equivalent to the following query:

GET /product_catalog/_search
{
  "query": {
    "match_phrase": {
      "product_name": "the smart watch"
    }
  }
}

Phrase queries use the search_quote_analyzer, which preserves stop words. As a result, the query “the smart watch” matches only documents containing that exact phrase, so the response includes only the first document:

{
  "took": 263,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.48081374,
    "hits": [
      {
        "_index": "product_catalog",
        "_id": "1",
        "_score": 0.48081374,
        "_source": {
          "product_name": "The Smart Watch Pro"
        }
      }
    ]
  }
}

Now, search for the text matching “the smart watch” (without quotation marks):

GET /product_catalog/_search
{
  "query": {
    "query_string": {
      "query": "the smart watch",
      "default_field": "product_name"
    }
  }
}

Because the query is not enclosed in quotation marks, it is a term-level query, which is equivalent to the following query:

GET /product_catalog/_search
{
  "query": {
    "match": {
      "product_name": "the smart watch"
    }
  }
}

Term-level queries use the search_analyzer, which tokenizes the text and removes stop words. As a result, the query is analyzed into the tokens [smart, watch] and matches both documents:

{
  "took": 38,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 0.16574687,
    "hits": [
      {
        "_index": "product_catalog",
        "_id": "1",
        "_score": 0.16574687,
        "_source": {
          "product_name": "The Smart Watch Pro"
        }
      },
      {
        "_index": "product_catalog",
        "_id": "2",
        "_score": 0.16574687,
        "_source": {
          "product_name": "A Smart Watch Ultra"
        }
      }
    ]
  }
}
350 characters left

Have a question? .

Want to contribute? or .