Index phrases

The index_phrases mapping parameter determines whether a field’s text is additionally processed to generate phrase tokens. When enabled, the system creates extra tokens representing sequences of exactly two consecutive words (bigrams). This can significantly improve the performance and accuracy of phrase queries. However, it also increases the index size and the time needed to index documents.

By default, index_phrases is set to false to maintain a leaner index and faster document ingestion.

Enabling index phrases on a field

The following example creates an index named blog in which the content field is configured with index_phrases:

PUT /blog
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "index_phrases": true
      }
    }
  }
}

Index a document using the following request:

PUT /blog/_doc/1
{
  "content": "The slow green turtle swims past the whale"
}

Perform a match_phrase query using the following search request:

POST /blog/_search
{
  "query": {
    "match_phrase": {
      "content": "slow green"
    }
  }
}

The query returns the stored document:

{
  "took": 25,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "blog",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "content": "The slow green turtle swims past the whale"
        }
      }
    ]
  }
}

Although the same hit is returned when you don’t provide the index_phrases mapping parameter, using this parameter ensures that the query performs as follows:

Uses the .index_phrases field internally
Matches pre-tokenized bigrams such as “slow green”, “green turtle”, or “turtle swims”.
Bypasses position lookups and is faster, especially at scale.