Link Search Menu Expand Document Documentation Menu

Doc values

By default, most fields are indexed and searchable using the inverted index. The inverted index works by storing a unique sorted list of terms and mapping each term to the documents that contain it.

Sorting, aggregations, and field access in scripts, however, require a different approach. Instead of finding documents from terms, these operations need to retrieve terms from specific documents.

Doc values make these operations possible. They are an on-disk, column-oriented data structure created at index time. Although they store the same values as the _source field, their format is optimized for fast sorting and aggregations.

Doc values are enabled by default on nearly all field types, except for text fields. If you know that a field won’t be used for sorting, aggregations, or scripting, you can disable doc values in order to reduce disk usage.

Example

To understand how doc_values affect fields, create a sample index. In this index, the status_code field has doc_values enabled by default, allowing it to support sorting and aggregations. The session_id field has doc_values disabled, so it does not support sorting or aggregations but can still be queried:

PUT /web_analytics
{
  "mappings": {
    "properties": {
      "status_code": {
        "type": "keyword"
      },
      "session_id": {
        "type": "keyword",
        "doc_values": false
      }
    }
  }
}

Add some sample data to the index:

PUT /web_analytics/_doc/1
{
  "status_code": "200",
  "session_id": "abc123"
}

PUT /web_analytics/_doc/2
{
  "status_code": "404",
  "session_id": "def456"
}

PUT /web_analytics/_doc/3
{
  "status_code": "200",
  "session_id": "ghi789"
}

Perform an aggregation on the status_code field:

GET /web_analytics/_search
{
  "size": 0,
  "aggs": {
    "status_codes": {
      "terms": {
        "field": "status_code"
      }
    }
  }
}

This aggregation returns correct results because status_code has doc_values enabled:

{
  "took": 37,
  "timed_out": false,
  "terminated_early": true,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "status_codes": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "200",
          "doc_count": 2
        },
        {
          "key": "404",
          "doc_count": 1
        }
      ]
    }
  }
}

Attempt to aggregate on the session_id field:

GET /web_analytics/_search
{
  "size": 0,
  "aggs": {
    "session_counts": {
      "terms": {
        "field": "session_id"
      }
    }
  }
}

This aggregation fails because session_id has doc_values disabled, preventing the document-to-field lookup required for aggregations.

350 characters left

Have a question? .

Want to contribute? or .