stats

The stats command calculates aggregations on the search results.

Comparing stats, eventstats, and streamstats

For a comprehensive comparison of stats, eventstats, and streamstats commands, including their differences in transformation behavior, output format, aggregation scope, and use cases, see Comparing stats, eventstats, and streamstats.

Syntax

The stats command has the following syntax:

stats [bucket_nullable=bool] <aggregation>... [by-clause]

Parameters

The stats command supports the following parameters.

Parameter	Required/Optional	Description
`<aggregation>`	Required	An aggregation function.
`<by-clause>`	Optional	Groups results by specified fields or expressions. Syntax: `by [span-expression,] [field,]...` If no `by-clause` is specified, the stats command returns only one row, which is the aggregation over the entire search results.
`bucket_nullable`	Optional	Controls whether to include `null` buckets in group-by aggregations. When `false`, ignores records in which the `group-by` field is null, resulting in faster performance. Default is the value of `plugins.ppl.syntax.legacy.preferred`.
`<span-expression>`	Optional	Splits a field into buckets by intervals (maximum of one). Syntax: `span(field_expr, interval_expr)`. By default, the interval uses the field’s default unit. For date/time fields, aggregation results ignore null values. Examples: `span(age, 10)` creates 10-year age buckets, and `span(timestamp, 1h)` creates hourly buckets. Valid time units are millisecond (`ms`), second (`s`), minute (`m`), hour (`h`), day (`d`), week (`w`), month (`M`), quarter (`q`), year (`y`).

Aggregation functions

The stats command supports the following aggregation functions:

COUNT/C – Count of values
SUM – Sum of numeric values
AVG – Average of numeric values
MAX – Maximum value
MIN – Minimum value
VAR_SAMP – Sample variance
VAR_POP – Population variance
STDDEV_SAMP – Sample standard deviation
STDDEV_POP – Population standard deviation
DISTINCT_COUNT_APPROX – Approximate distinct count
TAKE – List of original values
PERCENTILE/PERCENTILE_APPROX – Percentile calculations
PERC<percent>/P<percent> – Percentile shortcut functions
MEDIAN – 50th percentile
EARLIEST – Earliest value by timestamp
LATEST – Latest value by timestamp
FIRST – First non-null value
LAST – Last non-null value
LIST – Collect all values into array
VALUES – Collect unique values into sorted array

For detailed documentation of each function, see Aggregation Functions.

Example 1: Calculating the count of events

The following query counts the total number of log entries, a basic health check for log ingestion:

source=otellogs
| stats count() as total_logs

The query returns the following results:

total_logs
20

Example 2: Calculating the average of a field

The following query calculates the average severity number across all logs. A rising average over time may indicate increasing system instability:

source=otellogs
| stats avg(severityNumber) as avg_severity

The query returns the following results:

avg_severity
12.0

Example 3: Calculating the count by group

The following query counts logs by severity level, giving you a breakdown of your system’s health at a glance:

source=otellogs
| stats count() as log_count by severityText
| sort - log_count

The query returns the following results:

log_count	severityText
7	ERROR
6	INFO
4	WARN
3	DEBUG

Example 4: Calculating multiple aggregations by group

The following query calculates the total log count and severity range per service, helping you identify which services are most active and most problematic:

source=otellogs
| stats count() as total, min(severityNumber) as min_sev, max(severityNumber) as max_sev by `resource.attributes.service.name`
| sort - total
| head 5

The query returns the following results:

total	min_sev	max_sev	resource.attributes.service.name
4	9	9	frontend
4	5	17	product-catalog
3	5	9	cart
3	9	17	checkout
3	13	17	frontend-proxy

Example 5: Calculating the count by a span

The following query groups logs into severity buckets of 10, showing the distribution across low (0-9), medium (10-19), and high (20+) severity ranges:

source=otellogs
| stats count() as log_count by span(severityNumber, 10)

The query returns the following results:

log_count	span(severityNumber,10)
9	0
11	10

Example 6: Calculating the count by a field and span

The following query counts logs by severity level within severity number ranges, showing how severity text maps to numeric ranges:

source=otellogs
| stats count() as cnt by span(severityNumber, 10) as sev_range, severityText
| sort sev_range

The query returns the following results:

cnt	sev_range	severityText
3	0	DEBUG
6	0	INFO
7	10	ERROR
4	10	WARN

Example 7: Calculating the distinct count of a field

The following query counts the total and distinct number of services reporting logs, useful for verifying all expected services are reporting:

source=otellogs
| stats count(`resource.attributes.service.name`) as total_entries, distinct_count(`resource.attributes.service.name`) as unique_services

The query returns the following results:

total_entries	unique_services
20	7

Example 8: Collecting unique values using VALUES by group

The following query collects the unique service names for each severity level, useful for quickly seeing which services are affected at each level:

source=otellogs
| stats values(`resource.attributes.service.name`) as services by severityText
| sort severityText

The query returns the following results:

services	severityText
[cart,product-catalog]	DEBUG
[checkout,frontend-proxy,payment,product-catalog,recommendation]	ERROR
[cart,checkout,frontend]	INFO
[frontend-proxy,product-catalog]	WARN

Example 9: Calculating the percentile of a field

The following query calculates the 90th percentile of severity numbers, helping you understand the severity distribution:

source=otellogs
| stats percentile(severityNumber, 90) as p90_severity

The query returns the following results:

p90_severity
17

Example 10: Collecting unique values using VALUES

The following query collects all unique severity levels present in the logs:

source=otellogs
| stats values(severityText) as severity_levels

The query returns the following results:

severity_levels
[DEBUG,ERROR,INFO,WARN]

Example 11: Ignoring a null bucket

The following query excludes null values from grouping by setting bucket_nullable=false, useful when you only want to see services that have a defined namespace:

source=otellogs
| stats bucket_nullable=false count() as cnt by instrumentationScope.name

The query returns the following results:

cnt	instrumentationScope.name
2	@opentelemetry/instrumentation-http
1	Microsoft.Extensions.Hosting
1	go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc

Example 12: Date span grouping with null handling

The following example uses this sample index data:

Name	DEPTNO	birthday
Alice	1	2024-04-21
Bob	2	2025-08-21
Jeff	null	2025-04-22
Adam	2	null

The following query groups data by yearly spans of the birthday field, automatically excluding null values:

source=example
| stats count() as cnt by span(birthday, 1y) as year

The query returns the following results:

cnt	year
1	2024-01-01
2	2025-01-01

Group by both yearly spans and department number (by default, null DEPTNO values are included in the results):

source=example
| stats count() as cnt by span(birthday, 1y) as year, DEPTNO

The query returns the following results:

cnt	year	DEPTNO
1	2024-01-01	1
1	2025-01-01	2
1	2025-01-01	null

Use bucket_nullable=false to exclude null DEPTNO values from the grouping:

source=example
| stats bucket_nullable=false count() as cnt by span(birthday, 1y) as year, DEPTNO

The query returns the following results:

cnt	year	DEPTNO
1	2024-01-01	1
1	2025-01-01	2

Example 13: Calculating the count by the implicit @timestamp field

If you omit the field parameter in the span function, it automatically uses the implicit @timestamp field:

source=big5
| stats count() by span(1month)

The query returns the following results:

count()	span(1month)
1	2023-01-01 00:00:00

Limitations

The following limitations apply to the stats command.

Bucket aggregation results may be approximate for high-cardinality fields

In OpenSearch, doc_count values for a terms bucket aggregation can be approximate. Thus, any aggregations (such as sum or avg) performed on those buckets may also be approximate.

For example, the following query retrieves the top 10 URLs:

source=hits
| stats bucket_nullable=false count() as c by URL
| sort - c
| head 10

This query is translated into a terms aggregation in OpenSearch with "order": { "_count": "desc" }. For fields with high cardinality, some buckets may be discarded, so the results may only be approximate.

Sorting by doc_count in ascending order may produce inaccurate results

When retrieving the least frequent terms for high-cardinality fields, results may be inaccurate. Shard-level aggregations can miss globally rare terms or misrepresent their frequency, causing errors in the overall results.

For example, the following query retrieves the 10 least frequent URLs:

source=hits
| stats bucket_nullable=false count() as c by URL
| sort + c
| head 10

A globally rare term might not appear as rare on every shard or could be entirely absent from some shard results. Conversely, a term that is infrequent on one shard might be common on another. In both cases, shard-level approximations can cause rare terms to be missed, leading to inaccurate overall results.

Comparing stats, eventstats, and streamstats
Syntax
Parameters
Aggregation functions
Example 1: Calculating the count of events
Example 2: Calculating the average of a field
Example 3: Calculating the count by group
Example 4: Calculating multiple aggregations by group
Example 5: Calculating the count by a span
Example 6: Calculating the count by a field and span
Example 7: Calculating the distinct count of a field
Example 8: Collecting unique values using VALUES by group
Example 9: Calculating the percentile of a field
Example 10: Collecting unique values using VALUES
Example 11: Ignoring a null bucket
Example 12: Date span grouping with null handling
Example 13: Calculating the count by the implicit @timestamp field
Limitations
- Bucket aggregation results may be approximate for high-cardinality fields
- Sorting by doc_count in ascending order may produce inaccurate results

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

stats

Comparing stats, eventstats, and streamstats

Syntax

Parameters

Aggregation functions

Example 1: Calculating the count of events

Example 2: Calculating the average of a field

Example 3: Calculating the count by group

Example 4: Calculating multiple aggregations by group

Example 5: Calculating the count by a span

Example 6: Calculating the count by a field and span

Example 7: Calculating the distinct count of a field

Example 8: Collecting unique values using VALUES by group

Example 9: Calculating the percentile of a field

Example 10: Collecting unique values using VALUES

Example 11: Ignoring a null bucket

Example 12: Date span grouping with null handling

Example 13: Calculating the count by the implicit @timestamp field

Limitations

Bucket aggregation results may be approximate for high-cardinality fields

Sorting by doc_count in ascending order may produce inaccurate results

OpenSearch Links

Get Involved

Resources

Contact Us