Pipeline aggregations

Pipeline aggregations chain together multiple aggregations by using the output of one aggregation as the input for another. They compute complex statistical and mathematical measures like derivatives, moving averages, and cumulative sums. Some pipeline aggregations duplicate the functionality of metric and bucket aggregations but, in many cases, are more intuitive to use.

Pipeline aggregations are executed after all other sibling aggregations. This has performance implications. For example, using the bucket_selector pipeline aggregation to narrow a list of buckets does not reduce the number of computations performed on omitted buckets.

Pipeline aggregations cannot be sub-aggregated but can be chained to other pipeline aggregations. For example, you can calculate a second derivative by chaining two consecutive derivative aggregations. Keep in mind that pipeline aggregations append to existing output. For example, computing a second derivative by chaining derivative aggregations outputs both the first and second derivatives.

Pipeline aggregation types

Pipeline aggregations are of two types: sibling and parent.

Sibling aggregations

A sibling pipeline aggregation takes the output of a nested aggregation and produces new buckets or new aggregations at the same level as the nested buckets.

A sibling aggregation must be a multi-bucket aggregation (have multiple grouped values for a certain field), and the metric must be a numeric value.

Parent aggregations

A parent aggregation takes the output of an outer aggregation and produces new buckets or new aggregations at the same level as the existing buckets. Unlike sibling pipeline aggregations, which operate across all buckets and produce a single output, parent pipeline aggregations process each bucket individually and write the result back into each bucket.

The specified metric for a parent aggregation must be a numeric value.

We strongly recommend setting min_doc_count to 0 (the default for histogram aggregations) for parent aggregations. If min_doc_count is greater than 0, then the aggregation omits buckets, which might lead to incorrect results.

Supported pipeline aggregations

OpenSearch supports the following pipeline aggregations.

Name	Type	Description
`avg_bucket`	Sibling	Calculates the average of a metric in each bucket of a previous aggregation.
`bucket_script`	Parent	Executes a script to perform per-bucket numeric computations across a set of buckets.
`bucket_selector`	Parent	Evaluates a script to determine whether buckets returned by a `histogram` (or `date_histogram`) aggregation should be included in the final result.
`bucket_sort`	Parent	Sorts or truncates the buckets produced by its parent multi-bucket aggregation.
`cumulative_sum`	Parent	Calculates the cumulative sum across the buckets of a previous aggregation.
`derivative`	Parent	Calculates first-order and second-order derivatives of each bucket of an aggregation.
`extended_stats`	Sibling	A more comprehensive version of the `stats_bucket` aggregation that provides additional metrics.
`max_bucket`	Sibling	Calculates the maximum of a metric in each bucket of a previous aggregation.
`min_bucket`	Sibling	Calculates the minimum of a metric in each bucket of a previous aggregation.
`moving_avg` (Deprecated)	Parent	Calculates a sequence of averages of a metric contained in windows (adjacent subsets) of an ordered dataset.
`moving_fn`	Parent	Executes a script over a sliding window.
`percentiles_bucket`	Sibling	Calculates the percentile placement of bucketed metrics.
`serial_diff`	Parent	Calculates the difference between metric values in the current bucket and a previous bucket. It stores the result in the current bucket.
`stats_bucket`	Sibling	Returns a variety of stats (`count`, `min`, `max`, `avg`, and `sum`) for the buckets of a previous aggregation.
`sum_bucket`	Sibling	Calculates the sum of a metric in each bucket of a previous aggregation.

Buckets path

A pipeline aggregation uses the buckets_path parameter to reference the output of other aggregations. The buckets_path parameter has the following syntax:

buckets_path = <agg_name>[ > <agg_name> ... ][ .<metric_name> ]

This syntax uses the following elements.

Element	Description
`<agg_name>`	The name of the aggregation.
`>`	A child selector used to navigate from one aggregation (parent) to another nested aggregation (child).
`.<metric_name>`	Specifies a metric to retrieve from a multi-value aggregation. Required only if the target aggregation produces multiple metrics.

To visualize the buckets path, suppose you have the following aggregation structure:

"aggs": {
  "parent_agg": {
    "terms": {
      "field": "category"
    },
    "aggs": {
      "child_agg": {
        "stats": {
          "field": "price"
        }
      }
    }
  }
}

To reference the average price from the child_agg, which is nested in the parent_agg, use parent_agg>child_agg.avg.

Examples:

my_sum.sum: Refers to the sum metric from the my_sum aggregation.
popular_tags>my_sum.sum: Refers to the sum metric from the my_sum aggregation, which is nested under the popular_tags aggregation.

For multi-value metric aggregations like stats or percentiles, you must include the metric name (for example, .min) in the path. For single-value metrics like sum or avg, the metric name is optional if unambiguous.

Buckets path example

The following example operates on the OpenSearch Dashboards logs sample data. It creates a histogram of values in the bytes field, sums the phpmemory fields in each histogram bucket, and finally sums the buckets using the sum_bucket pipeline aggregation. The buckets_path follows the number_of_bytes>sum_total_memory path from the number_of_bytes parent aggregation to the sum_total_memory subaggregation:

GET opensearch_dashboards_sample_data_logs/_search
{
  "size": 0,
  "aggs": {
    "number_of_bytes": {
      "histogram": {
        "field": "bytes",
        "interval": 10000
      },
      "aggs": {
        "sum_total_memory": {
          "sum": {
            "field": "phpmemory"
          }
        }
      }
    },
    "sum_copies": {
      "sum_bucket": {
        "buckets_path": "number_of_bytes>sum_total_memory"
      }
    }
  }
}

Note that the buckets_path contains the names of the component aggregations. Paths are directed, meaning that they cascade one way, downward from parents to children.

The pipeline aggregation returns the total memory summed from all the buckets:

{
  ...
  "aggregations": {
    "number_of_bytes": {
      "buckets": [
        {
          "key": 0,
          "doc_count": 13372,
          "sum_total_memory": {
            "value": 91266400
          }
        },
        {
          "key": 10000,
          "doc_count": 702,
          "sum_total_memory": {
            "value": 0
          }
        }
      ]
    },
    "sum_copies": {
      "value": 91266400
    }
  }
}

Count paths

You can direct the buckets_path to use a count rather than a value as its input. To do so, use the _count buckets path variable.

The following example computes basic stats on a histogram of the number of bytes from the OpenSearch Dashboards logs sample data. It creates a histogram of values in the bytes field and then computes the stats on the counts in the histogram buckets.

GET opensearch_dashboards_sample_data_logs/_search
{
  "size": 0,
  "aggs": {
    "number_of_bytes": {
      "histogram": {
        "field": "bytes",
        "interval": 10000
      }
    },
    "count_stats": {
      "stats_bucket": {
        "buckets_path": "number_of_bytes>_count"
      }
    }
  }
}

The results show stats about the document counts of the buckets:

{
...
  "aggregations": {
    "number_of_bytes": {
      "buckets": [
        {
          "key": 0,
          "doc_count": 13372
        },
        {
          "key": 10000,
          "doc_count": 702
        }
      ]
    },
    "count_stats": {
      "count": 2,
      "min": 702,
      "max": 13372,
      "avg": 7037,
      "sum": 14074
    }
  }
}

Data gaps

Real-world data can be missing from nested aggregations for a number of reasons, including:

Missing values in documents.
Empty buckets anywhere in the chain of aggregations.
Missing data needed to calculate a bucket value (for example, rolling functions such as derivative require one or more previous values to start).

You can specify a policy to handle missing data using the gap_policy property: either skip the missing data or replace the missing data with zeros.

The gap_policy parameter is valid for all pipeline aggregations.

Parameter	Required/Optional	Data type	Description
`gap_policy`	Optional	String	The policy to apply to missing data. Valid values are `skip` and `insert_zeros`. Default is `skip`.
`format`	Optional	String	A DecimalFormat formatting string. Returns the formatted output in the aggregation’s `value_as_string` property.

Pipeline aggregation types
- Sibling aggregations
- Parent aggregations
Supported pipeline aggregations
Buckets path
- Buckets path example
- Count paths
Data gaps

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

Pipeline aggregations

Pipeline aggregation types

Sibling aggregations

Parent aggregations

Supported pipeline aggregations

Buckets path

Buckets path example

Count paths

Data gaps

OpenSearch Links

Get Involved

Resources

Contact Us