stats
The stats command calculates aggregations on the search results.
Comparing stats, eventstats, and streamstats
For a comprehensive comparison of stats, eventstats, and streamstats commands, including their differences in transformation behavior, output format, aggregation scope, and use cases, see Comparing stats, eventstats, and streamstats.
Syntax
The stats command has the following syntax:
stats [bucket_nullable=bool] <aggregation>... [by-clause]
Parameters
The stats command supports the following parameters.
| Parameter | Required/Optional | Description |
|---|---|---|
<aggregation> | Required | An aggregation function. |
<by-clause> | Optional | Groups results by specified fields or expressions. Syntax: by [span-expression,] [field,]... If no by-clause is specified, the stats command returns only one row, which is the aggregation over the entire search results. |
bucket_nullable | Optional | Controls whether to include null buckets in group-by aggregations. When false, ignores records in which the group-by field is null, resulting in faster performance. Default is the value of plugins.ppl.syntax.legacy.preferred. |
<span-expression> | Optional | Splits a field into buckets by intervals (maximum of one). Syntax: span(field_expr, interval_expr). By default, the interval uses the field’s default unit. For date/time fields, aggregation results ignore null values. Examples: span(age, 10) creates 10-year age buckets, and span(timestamp, 1h) creates hourly buckets. Valid time units are millisecond (ms), second (s), minute (m), hour (h), day (d), week (w), month (M), quarter (q), year (y). |
Aggregation functions
The stats command supports the following aggregation functions:
COUNT/C– Count of valuesSUM– Sum of numeric valuesAVG– Average of numeric valuesMAX– Maximum valueMIN– Minimum valueVAR_SAMP– Sample varianceVAR_POP– Population varianceSTDDEV_SAMP– Sample standard deviationSTDDEV_POP– Population standard deviationDISTINCT_COUNT_APPROX– Approximate distinct countTAKE– List of original valuesPERCENTILE/PERCENTILE_APPROX– Percentile calculationsPERC<percent>/P<percent>– Percentile shortcut functionsMEDIAN– 50th percentileEARLIEST– Earliest value by timestampLATEST– Latest value by timestampFIRST– First non-null valueLAST– Last non-null valueLIST– Collect all values into arrayVALUES– Collect unique values into sorted array
Example 1: Calculate the count of events
The following query calculates the count of events in the accounts index:
source=accounts
| stats count()
The query returns the following results:
| count() |
|---|
| 4 |
Example 2: Calculate the average of a field
The following query calculates the average age for all accounts:
source=accounts
| stats avg(age)
The query returns the following results:
| avg(age) |
|---|
| 32.25 |
Example 3: Calculate the average of a field by group
The following query calculates the average age for all accounts, grouped by gender:
source=accounts
| stats avg(age) by gender
The query returns the following results:
| avg(age) | gender |
|---|---|
| 28.0 | F |
| 33.666666666666664 | M |
Example 4: Calculate the average, sum, and count of a field by group
The following query calculates the average age, sum of ages, and count of events for all accounts, grouped by gender:
source=accounts
| stats avg(age), sum(age), count() by gender
The query returns the following results:
| avg(age) | sum(age) | count() | gender |
|---|---|---|---|
| 28.0 | 28 | 1 | F |
| 33.666666666666664 | 101 | 3 | M |
Example 5: Calculate the maximum of a field
The following query calculates the maximum age for all accounts:
source=accounts
| stats max(age)
The query returns the following results:
| max(age) |
|---|
| 36 |
Example 6: Calculate the maximum and minimum of a field by group
The following query calculates the maximum and minimum ages for all accounts, grouped by gender:
source=accounts
| stats max(age), min(age) by gender
The query returns the following results:
| max(age) | min(age) | gender |
|---|---|---|
| 28 | 28 | F |
| 36 | 32 | M |
Example 7: Calculate the distinct count of a field
To retrieve the count of distinct values of a field, you can use the DISTINCT_COUNT (or DC) function instead of COUNT. The following query calculates both the count and the distinct count of the gender field for all accounts:
source=accounts
| stats count(gender), distinct_count(gender)
The query returns the following results:
| count(gender) | distinct_count(gender) |
|---|---|
| 4 | 2 |
Example 8: Calculate the count by a span
The following query retrieves the count of age values grouped into 10-year intervals:
source=accounts
| stats count(age) by span(age, 10) as age_span
The query returns the following results:
| count(age) | age_span |
|---|---|
| 1 | 20 |
| 3 | 30 |
Example 9: Calculate the count by a gender and span
The following query retrieves the count of age grouped into 5-year intervals and broken down by gender:
source=accounts
| stats count() as cnt by span(age, 5) as age_span, gender
The query returns the following results:
| cnt | age_span | gender |
|---|---|---|
| 1 | 25 | F |
| 2 | 30 | M |
| 1 | 35 | M |
The span expression is always treated as the first grouping key, regardless of its position in the by clause:
source=accounts
| stats count() as cnt by gender, span(age, 5) as age_span
The query returns the following results:
| cnt | age_span | gender |
|---|---|---|
| 1 | 25 | F |
| 2 | 30 | M |
| 1 | 35 | M |
Example 10: Count and retrieve an email list by gender and age span
The following query calculates the count of age values grouped into 5-year intervals as well as by gender and also returns a list of up to 5 emails for each group:
source=accounts
| stats count() as cnt, take(email, 5) by span(age, 5) as age_span, gender
The query returns the following results:
| cnt | take(email, 5) | age_span | gender |
|---|---|---|---|
| 1 | [] | 25 | F |
| 2 | [amberduke@pyrami.com,daleadams@boink.com] | 30 | M |
| 1 | [hattiebond@netagy.com] | 35 | M |
Example 11: Calculate the percentile of a field
The following query calculates the 90th percentile of age for all accounts:
source=accounts
| stats percentile(age, 90)
The query returns the following results:
| percentile(age, 90) |
|---|
| 36 |
Example 12: Calculate the percentile of a field by group
The following query calculates the 90th percentile of age for all accounts, grouped by gender:
source=accounts
| stats percentile(age, 90) by gender
The query returns the following results:
| percentile(age, 90) | gender |
|---|---|
| 28 | F |
| 36 | M |
Example 13: Calculate the percentile by a gender and span
The following query calculates the 90th percentile of age, grouped into 10-year intervals as well as by gender:
source=accounts
| stats percentile(age, 90) as p90 by span(age, 10) as age_span, gender
The query returns the following results:
| p90 | age_span | gender |
|---|---|---|
| 28 | 20 | F |
| 36 | 30 | M |
Example 14: Collect all values in a field using LIST
The following query collects all firstname values, preserving duplicates and order:
source=accounts
| stats list(firstname)
The query returns the following results:
| list(firstname) |
|---|
| [Amber,Hattie,Nanette,Dale] |
Example 15: Ignore a null bucket
The following query excludes null values from grouping by setting bucket_nullable=false:
source=accounts
| stats bucket_nullable=false count() as cnt by email
The query returns the following results:
| cnt | |
|---|---|
| 1 | amberduke@pyrami.com |
| 1 | daleadams@boink.com |
| 1 | hattiebond@netagy.com |
Example 16: Collect unique values in a field using VALUES
The following query collects all unique firstname values, sorted lexicographically with duplicates removed:
source=accounts
| stats values(firstname)
The query returns the following results:
| values(firstname) |
|---|
| [Amber,Dale,Hattie,Nanette] |
Example 17: Date span grouping with null handling
The following example uses this sample index data:
| Name | DEPTNO | birthday |
|---|---|---|
| Alice | 1 | 2024-04-21 |
| Bob | 2 | 2025-08-21 |
| Jeff | null | 2025-04-22 |
| Adam | 2 | null |
The following query groups data by yearly spans of the birthday field, automatically excluding null values:
source=example
| stats count() as cnt by span(birthday, 1y) as year
The query returns the following results:
| cnt | year |
|---|---|
| 1 | 2024-01-01 |
| 2 | 2025-01-01 |
Group by both yearly spans and department number (by default, null DEPTNO values are included in the results):
source=example
| stats count() as cnt by span(birthday, 1y) as year, DEPTNO
The query returns the following results:
| cnt | year | DEPTNO |
|---|---|---|
| 1 | 2024-01-01 | 1 |
| 1 | 2025-01-01 | 2 |
| 1 | 2025-01-01 | null |
Use bucket_nullable=false to exclude null DEPTNO values from the grouping:
source=example
| stats bucket_nullable=false count() as cnt by span(birthday, 1y) as year, DEPTNO
The query returns the following results:
| cnt | year | DEPTNO |
|---|---|---|
| 1 | 2024-01-01 | 1 |
| 1 | 2025-01-01 | 2 |
Example 18: Calculate the count by the implicit @timestamp field
If you omit the field parameter in the span function, it automatically uses the implicit @timestamp field:
source=big5
| stats count() by span(1month)
The query returns the following results:
| count() | span(1month) |
|---|---|
| 1 | 2023-01-01 00:00:00 |
Limitations
The following limitations apply to the stats command.
Bucket aggregation results may be approximate for high-cardinality fields
In OpenSearch, doc_count values for a terms bucket aggregation can be approximate. Thus, any aggregations (such as sum or avg) performed on those buckets may also be approximate.
For example, the following query retrieves the top 10 URLs:
source=hits
| stats bucket_nullable=false count() as c by URL
| sort - c
| head 10
This query is translated into a terms aggregation in OpenSearch with "order": { "_count": "desc" }. For fields with high cardinality, some buckets may be discarded, so the results may only be approximate.
Sorting by doc_count in ascending order may produce inaccurate results
When retrieving the least frequent terms for high-cardinality fields, results may be inaccurate. Shard-level aggregations can miss globally rare terms or misrepresent their frequency, causing errors in the overall results.
For example, the following query retrieves the 10 least frequent URLs:
source=hits
| stats bucket_nullable=false count() as c by URL
| sort + c
| head 10
A globally rare term might not appear as rare on every shard or could be entirely absent from some shard results. Conversely, a term that is infrequent on one shard might be common on another. In both cases, shard-level approximations can cause rare terms to be missed, leading to inaccurate overall results.