stats

The stats command calculates aggregations on the search results.

Comparing stats, eventstats, and streamstats

For a comprehensive comparison of stats, eventstats, and streamstats commands, including their differences in transformation behavior, output format, aggregation scope, and use cases, see Comparing stats, eventstats, and streamstats.

Syntax

The stats command has the following syntax:

stats [bucket_nullable=bool] <aggregation>... [by-clause]

Parameters

The stats command supports the following parameters.

Parameter	Required/Optional	Description
`<aggregation>`	Required	An aggregation function.
`<by-clause>`	Optional	Groups results by specified fields or expressions. Syntax: `by [span-expression,] [field,]...` If no `by-clause` is specified, the stats command returns only one row, which is the aggregation over the entire search results.
`bucket_nullable`	Optional	Controls whether to include `null` buckets in group-by aggregations. When `false`, ignores records in which the `group-by` field is null, resulting in faster performance. Default is the value of `plugins.ppl.syntax.legacy.preferred`.
`<span-expression>`	Optional	Splits a field into buckets by intervals (maximum of one). Syntax: `span(field_expr, interval_expr)`. By default, the interval uses the field’s default unit. For date/time fields, aggregation results ignore null values. Examples: `span(age, 10)` creates 10-year age buckets, and `span(timestamp, 1h)` creates hourly buckets. Valid time units are millisecond (`ms`), second (`s`), minute (`m`), hour (`h`), day (`d`), week (`w`), month (`M`), quarter (`q`), year (`y`).

Aggregation functions

The stats command supports the following aggregation functions:

COUNT/C – Count of values
SUM – Sum of numeric values
AVG – Average of numeric values
MAX – Maximum value
MIN – Minimum value
VAR_SAMP – Sample variance
VAR_POP – Population variance
STDDEV_SAMP – Sample standard deviation
STDDEV_POP – Population standard deviation
DISTINCT_COUNT_APPROX – Approximate distinct count
TAKE – List of original values
PERCENTILE/PERCENTILE_APPROX – Percentile calculations
PERC<percent>/P<percent> – Percentile shortcut functions
MEDIAN – 50th percentile
EARLIEST – Earliest value by timestamp
LATEST – Latest value by timestamp
FIRST – First non-null value
LAST – Last non-null value
LIST – Collect all values into array
VALUES – Collect unique values into sorted array

For detailed documentation of each function, see Aggregation Functions.

Example 1: Calculate the count of events

The following query calculates the count of events in the accounts index:

source=accounts
| stats count()

The query returns the following results:

count()
4

Example 2: Calculate the average of a field

The following query calculates the average age for all accounts:

source=accounts
| stats avg(age)

The query returns the following results:

avg(age)
32.25

Example 3: Calculate the average of a field by group

The following query calculates the average age for all accounts, grouped by gender:

source=accounts
| stats avg(age) by gender

The query returns the following results:

avg(age)	gender
28.0	F
33.666666666666664	M

Example 4: Calculate the average, sum, and count of a field by group

The following query calculates the average age, sum of ages, and count of events for all accounts, grouped by gender:

source=accounts
| stats avg(age), sum(age), count() by gender

The query returns the following results:

avg(age)	sum(age)	count()	gender
28.0	28	1	F
33.666666666666664	101	3	M

Example 5: Calculate the maximum of a field

The following query calculates the maximum age for all accounts:

source=accounts
| stats max(age)

The query returns the following results:

max(age)
36

Example 6: Calculate the maximum and minimum of a field by group

The following query calculates the maximum and minimum ages for all accounts, grouped by gender:

source=accounts
| stats max(age), min(age) by gender

The query returns the following results:

max(age)	min(age)	gender
28	28	F
36	32	M

Example 7: Calculate the distinct count of a field

To retrieve the count of distinct values of a field, you can use the DISTINCT_COUNT (or DC) function instead of COUNT. The following query calculates both the count and the distinct count of the gender field for all accounts:

source=accounts
| stats count(gender), distinct_count(gender)

The query returns the following results:

count(gender)	distinct_count(gender)
4	2

Example 8: Calculate the count by a span

The following query retrieves the count of age values grouped into 10-year intervals:

source=accounts
| stats count(age) by span(age, 10) as age_span

The query returns the following results:

count(age)	age_span
1	20
3	30

Example 9: Calculate the count by a gender and span

The following query retrieves the count of age grouped into 5-year intervals and broken down by gender:

source=accounts
| stats count() as cnt by span(age, 5) as age_span, gender

The query returns the following results:

cnt	age_span	gender
1	25	F
2	30	M
1	35	M

The span expression is always treated as the first grouping key, regardless of its position in the by clause:

source=accounts
| stats count() as cnt by gender, span(age, 5) as age_span

The query returns the following results:

cnt	age_span	gender
1	25	F
2	30	M
1	35	M

Example 10: Count and retrieve an email list by gender and age span

The following query calculates the count of age values grouped into 5-year intervals as well as by gender and also returns a list of up to 5 emails for each group:

source=accounts
| stats count() as cnt, take(email, 5) by span(age, 5) as age_span, gender

The query returns the following results:

cnt	take(email, 5)	age_span	gender
1	[]	25	F
2	[amberduke@pyrami.com,daleadams@boink.com]	30	M
1	[hattiebond@netagy.com]	35	M

Example 11: Calculate the percentile of a field

The following query calculates the 90th percentile of age for all accounts:

source=accounts
| stats percentile(age, 90)

The query returns the following results:

percentile(age, 90)
36

Example 12: Calculate the percentile of a field by group

The following query calculates the 90th percentile of age for all accounts, grouped by gender:

source=accounts
| stats percentile(age, 90) by gender

The query returns the following results:

percentile(age, 90)	gender
28	F
36	M

Example 13: Calculate the percentile by a gender and span

The following query calculates the 90th percentile of age, grouped into 10-year intervals as well as by gender:

source=accounts
| stats percentile(age, 90) as p90 by span(age, 10) as age_span, gender

The query returns the following results:

p90	age_span	gender
28	20	F
36	30	M

Example 14: Collect all values in a field using LIST

The following query collects all firstname values, preserving duplicates and order:

source=accounts
| stats list(firstname)

The query returns the following results:

list(firstname)
[Amber,Hattie,Nanette,Dale]

Example 15: Ignore a null bucket

The following query excludes null values from grouping by setting bucket_nullable=false:

source=accounts
| stats bucket_nullable=false count() as cnt by email

The query returns the following results:

cnt	email
1	amberduke@pyrami.com
1	daleadams@boink.com
1	hattiebond@netagy.com

Example 16: Collect unique values in a field using VALUES

The following query collects all unique firstname values, sorted lexicographically with duplicates removed:

source=accounts
| stats values(firstname)

The query returns the following results:

values(firstname)
[Amber,Dale,Hattie,Nanette]

Example 17: Date span grouping with null handling

The following example uses this sample index data:

Name	DEPTNO	birthday
Alice	1	2024-04-21
Bob	2	2025-08-21
Jeff	null	2025-04-22
Adam	2	null

The following query groups data by yearly spans of the birthday field, automatically excluding null values:

source=example
| stats count() as cnt by span(birthday, 1y) as year

The query returns the following results:

cnt	year
1	2024-01-01
2	2025-01-01

Group by both yearly spans and department number (by default, null DEPTNO values are included in the results):

source=example
| stats count() as cnt by span(birthday, 1y) as year, DEPTNO

The query returns the following results:

cnt	year	DEPTNO
1	2024-01-01	1
1	2025-01-01	2
1	2025-01-01	null

Use bucket_nullable=false to exclude null DEPTNO values from the grouping:

source=example
| stats bucket_nullable=false count() as cnt by span(birthday, 1y) as year, DEPTNO

The query returns the following results:

cnt	year	DEPTNO
1	2024-01-01	1
1	2025-01-01	2

Example 18: Calculate the count by the implicit @timestamp field

If you omit the field parameter in the span function, it automatically uses the implicit @timestamp field:

source=big5
| stats count() by span(1month)

The query returns the following results:

count()	span(1month)
1	2023-01-01 00:00:00

Limitations

The following limitations apply to the stats command.

Bucket aggregation results may be approximate for high-cardinality fields

In OpenSearch, doc_count values for a terms bucket aggregation can be approximate. Thus, any aggregations (such as sum or avg) performed on those buckets may also be approximate.

For example, the following query retrieves the top 10 URLs:

source=hits
| stats bucket_nullable=false count() as c by URL
| sort - c
| head 10

This query is translated into a terms aggregation in OpenSearch with "order": { "_count": "desc" }. For fields with high cardinality, some buckets may be discarded, so the results may only be approximate.

Sorting by doc_count in ascending order may produce inaccurate results

When retrieving the least frequent terms for high-cardinality fields, results may be inaccurate. Shard-level aggregations can miss globally rare terms or misrepresent their frequency, causing errors in the overall results.

For example, the following query retrieves the 10 least frequent URLs:

source=hits
| stats bucket_nullable=false count() as c by URL
| sort + c
| head 10

A globally rare term might not appear as rare on every shard or could be entirely absent from some shard results. Conversely, a term that is infrequent on one shard might be common on another. In both cases, shard-level approximations can cause rare terms to be missed, leading to inaccurate overall results.

Comparing stats, eventstats, and streamstats
Syntax
Parameters
Aggregation functions
Example 1: Calculate the count of events
Example 2: Calculate the average of a field
Example 3: Calculate the average of a field by group
Example 4: Calculate the average, sum, and count of a field by group
Example 5: Calculate the maximum of a field
Example 6: Calculate the maximum and minimum of a field by group
Example 7: Calculate the distinct count of a field
Example 8: Calculate the count by a span
Example 9: Calculate the count by a gender and span
Example 10: Count and retrieve an email list by gender and age span
Example 11: Calculate the percentile of a field
Example 12: Calculate the percentile of a field by group
Example 13: Calculate the percentile by a gender and span
Example 14: Collect all values in a field using LIST
Example 15: Ignore a null bucket
Example 16: Collect unique values in a field using VALUES
Example 17: Date span grouping with null handling
Example 18: Calculate the count by the implicit @timestamp field
Limitations
- Bucket aggregation results may be approximate for high-cardinality fields
- Sorting by doc_count in ascending order may produce inaccurate results

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

stats

Comparing stats, eventstats, and streamstats

Syntax

Parameters

Aggregation functions

Example 1: Calculate the count of events

Example 2: Calculate the average of a field

Example 3: Calculate the average of a field by group

Example 4: Calculate the average, sum, and count of a field by group

Example 5: Calculate the maximum of a field

Example 6: Calculate the maximum and minimum of a field by group

Example 7: Calculate the distinct count of a field

Example 8: Calculate the count by a span

Example 9: Calculate the count by a gender and span

Example 10: Count and retrieve an email list by gender and age span

Example 11: Calculate the percentile of a field

Example 12: Calculate the percentile of a field by group

Example 13: Calculate the percentile by a gender and span

Example 14: Collect all values in a field using LIST

Example 15: Ignore a null bucket

Example 16: Collect unique values in a field using VALUES

Example 17: Date span grouping with null handling

Example 18: Calculate the count by the implicit @timestamp field

Limitations

Bucket aggregation results may be approximate for high-cardinality fields

Sorting by doc_count in ascending order may produce inaccurate results

OpenSearch Links

Get Involved

Resources

Contact Us