stats
The stats command calculates aggregations on the search results.
Comparing stats, eventstats, and streamstats
For a comprehensive comparison of stats, eventstats, and streamstats commands, including their differences in transformation behavior, output format, aggregation scope, and use cases, see Comparing stats, eventstats, and streamstats.
Syntax
The stats command has the following syntax:
stats [bucket_nullable=bool] <aggregation>... [by-clause]
Parameters
The stats command supports the following parameters.
| Parameter | Required/Optional | Description |
|---|---|---|
<aggregation> | Required | An aggregation function. |
<by-clause> | Optional | Groups results by specified fields or expressions. Syntax: by [span-expression,] [field,]... If no by-clause is specified, the stats command returns only one row, which is the aggregation over the entire search results. |
bucket_nullable | Optional | Controls whether to include null buckets in group-by aggregations. When false, ignores records in which the group-by field is null, resulting in faster performance. Default is the value of plugins.ppl.syntax.legacy.preferred. |
<span-expression> | Optional | Splits a field into buckets by intervals (maximum of one). Syntax: span(field_expr, interval_expr). By default, the interval uses the field’s default unit. For date/time fields, aggregation results ignore null values. Examples: span(age, 10) creates 10-year age buckets, and span(timestamp, 1h) creates hourly buckets. Valid time units are millisecond (ms), second (s), minute (m), hour (h), day (d), week (w), month (M), quarter (q), year (y). |
Aggregation functions
The stats command supports the following aggregation functions:
COUNT/C– Count of valuesSUM– Sum of numeric valuesAVG– Average of numeric valuesMAX– Maximum valueMIN– Minimum valueVAR_SAMP– Sample varianceVAR_POP– Population varianceSTDDEV_SAMP– Sample standard deviationSTDDEV_POP– Population standard deviationDISTINCT_COUNT_APPROX– Approximate distinct countTAKE– List of original valuesPERCENTILE/PERCENTILE_APPROX– Percentile calculationsPERC<percent>/P<percent>– Percentile shortcut functionsMEDIAN– 50th percentileEARLIEST– Earliest value by timestampLATEST– Latest value by timestampFIRST– First non-null valueLAST– Last non-null valueLIST– Collect all values into arrayVALUES– Collect unique values into sorted array
For detailed documentation of each function, see Aggregation Functions.
Example 1: Calculating the count of events
The following query counts the total number of log entries, a basic health check for log ingestion:
source=otellogs
| stats count() as total_logs
The query returns the following results:
| total_logs |
|---|
| 20 |
Example 2: Calculating the average of a field
The following query calculates the average severity number across all logs. A rising average over time may indicate increasing system instability:
source=otellogs
| stats avg(severityNumber) as avg_severity
The query returns the following results:
| avg_severity |
|---|
| 12.0 |
Example 3: Calculating the count by group
The following query counts logs by severity level, giving you a breakdown of your system’s health at a glance:
source=otellogs
| stats count() as log_count by severityText
| sort - log_count
The query returns the following results:
| log_count | severityText |
|---|---|
| 7 | ERROR |
| 6 | INFO |
| 4 | WARN |
| 3 | DEBUG |
Example 4: Calculating multiple aggregations by group
The following query calculates the total log count and severity range per service, helping you identify which services are most active and most problematic:
source=otellogs
| stats count() as total, min(severityNumber) as min_sev, max(severityNumber) as max_sev by `resource.attributes.service.name`
| sort - total
| head 5
The query returns the following results:
| total | min_sev | max_sev | resource.attributes.service.name |
|---|---|---|---|
| 4 | 9 | 9 | frontend |
| 4 | 5 | 17 | product-catalog |
| 3 | 5 | 9 | cart |
| 3 | 9 | 17 | checkout |
| 3 | 13 | 17 | frontend-proxy |
Example 5: Calculating the count by a span
The following query groups logs into severity buckets of 10, showing the distribution across low (0-9), medium (10-19), and high (20+) severity ranges:
source=otellogs
| stats count() as log_count by span(severityNumber, 10)
The query returns the following results:
| log_count | span(severityNumber,10) |
|---|---|
| 9 | 0 |
| 11 | 10 |
Example 6: Calculating the count by a field and span
The following query counts logs by severity level within severity number ranges, showing how severity text maps to numeric ranges:
source=otellogs
| stats count() as cnt by span(severityNumber, 10) as sev_range, severityText
| sort sev_range
The query returns the following results:
| cnt | sev_range | severityText |
|---|---|---|
| 3 | 0 | DEBUG |
| 6 | 0 | INFO |
| 7 | 10 | ERROR |
| 4 | 10 | WARN |
Example 7: Calculating the distinct count of a field
The following query counts the total and distinct number of services reporting logs, useful for verifying all expected services are reporting:
source=otellogs
| stats count(`resource.attributes.service.name`) as total_entries, distinct_count(`resource.attributes.service.name`) as unique_services
The query returns the following results:
| total_entries | unique_services |
|---|---|
| 20 | 7 |
Example 8: Collecting unique values using VALUES by group
The following query collects the unique service names for each severity level, useful for quickly seeing which services are affected at each level:
source=otellogs
| stats values(`resource.attributes.service.name`) as services by severityText
| sort severityText
The query returns the following results:
| services | severityText |
|---|---|
| [cart,product-catalog] | DEBUG |
| [checkout,frontend-proxy,payment,product-catalog,recommendation] | ERROR |
| [cart,checkout,frontend] | INFO |
| [frontend-proxy,product-catalog] | WARN |
Example 9: Calculating the percentile of a field
The following query calculates the 90th percentile of severity numbers, helping you understand the severity distribution:
source=otellogs
| stats percentile(severityNumber, 90) as p90_severity
The query returns the following results:
| p90_severity |
|---|
| 17 |
Example 10: Collecting unique values using VALUES
The following query collects all unique severity levels present in the logs:
source=otellogs
| stats values(severityText) as severity_levels
The query returns the following results:
| severity_levels |
|---|
| [DEBUG,ERROR,INFO,WARN] |
Example 11: Ignoring a null bucket
The following query excludes null values from grouping by setting bucket_nullable=false, useful when you only want to see services that have a defined namespace:
source=otellogs
| stats bucket_nullable=false count() as cnt by instrumentationScope.name
The query returns the following results:
| cnt | instrumentationScope.name |
|---|---|
| 2 | @opentelemetry/instrumentation-http |
| 1 | Microsoft.Extensions.Hosting |
| 1 | go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc |
Example 12: Date span grouping with null handling
The following example uses this sample index data:
| Name | DEPTNO | birthday |
|---|---|---|
| Alice | 1 | 2024-04-21 |
| Bob | 2 | 2025-08-21 |
| Jeff | null | 2025-04-22 |
| Adam | 2 | null |
The following query groups data by yearly spans of the birthday field, automatically excluding null values:
source=example
| stats count() as cnt by span(birthday, 1y) as year
The query returns the following results:
| cnt | year |
|---|---|
| 1 | 2024-01-01 |
| 2 | 2025-01-01 |
Group by both yearly spans and department number (by default, null DEPTNO values are included in the results):
source=example
| stats count() as cnt by span(birthday, 1y) as year, DEPTNO
The query returns the following results:
| cnt | year | DEPTNO |
|---|---|---|
| 1 | 2024-01-01 | 1 |
| 1 | 2025-01-01 | 2 |
| 1 | 2025-01-01 | null |
Use bucket_nullable=false to exclude null DEPTNO values from the grouping:
source=example
| stats bucket_nullable=false count() as cnt by span(birthday, 1y) as year, DEPTNO
The query returns the following results:
| cnt | year | DEPTNO |
|---|---|---|
| 1 | 2024-01-01 | 1 |
| 1 | 2025-01-01 | 2 |
Example 13: Calculating the count by the implicit @timestamp field
If you omit the field parameter in the span function, it automatically uses the implicit @timestamp field:
source=big5
| stats count() by span(1month)
The query returns the following results:
| count() | span(1month) |
|---|---|
| 1 | 2023-01-01 00:00:00 |
Limitations
The following limitations apply to the stats command.
Bucket aggregation results may be approximate for high-cardinality fields
In OpenSearch, doc_count values for a terms bucket aggregation can be approximate. Thus, any aggregations (such as sum or avg) performed on those buckets may also be approximate.
For example, the following query retrieves the top 10 URLs:
source=hits
| stats bucket_nullable=false count() as c by URL
| sort - c
| head 10
This query is translated into a terms aggregation in OpenSearch with "order": { "_count": "desc" }. For fields with high cardinality, some buckets may be discarded, so the results may only be approximate.
Sorting by doc_count in ascending order may produce inaccurate results
When retrieving the least frequent terms for high-cardinality fields, results may be inaccurate. Shard-level aggregations can miss globally rare terms or misrepresent their frequency, causing errors in the overall results.
For example, the following query retrieves the 10 least frequent URLs:
source=hits
| stats bucket_nullable=false count() as c by URL
| sort + c
| head 10
A globally rare term might not appear as rare on every shard or could be entirely absent from some shard results. Conversely, a term that is infrequent on one shard might be common on another. In both cases, shard-level approximations can cause rare terms to be missed, leading to inaccurate overall results.