Commands
Table of contents
PPL supports most common SQL functions, including relevance search, but also introduces several more functions (called commands), which are available in PPL only.
ad
Syntax and examples
The ad command applies the Random Cut Forest (RCF) algorithm in the ML Commons plugin on the search result returned by a PPL command. Based on the input, the plugin uses two types of RCF algorithms: fixed-in-time RCF for processing time-series data and batch RCF for processing non-time-series data.
Syntax: Fixed-in-time RCF for time-series data command
ad <shingle_size> <time_decay> <time_field>
The following table describes the parameters for the ad command when using fixed-in-time RCF for time-series data.
| Field | Description | Required |
|---|---|---|
shingle_size | A consecutive sequence of the most recent records. The default value is 8. | No |
time_decay | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No |
time_field | Specifies the time field for RCF to use as time-series data. Must be either a long value, such as the timestamp in milliseconds, or a string value in “yyyy-MM-dd HH:mm:ss”. | Yes |
Syntax: Batch RCF for non-time-series data command
ad <shingle_size> <time_decay>
The following table describes the parameters for the ad command when using batch RCF for non-time-series data.
| Field | Description | Required |
|---|---|---|
shingle_size | A consecutive sequence of the most recent records. The default value is 8. | No |
time_decay | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No |
Example 1: Detecting events in New York City from taxi ridership data with time-series data
The following example trains an RCF model and uses the model to detect anomalies in the time-series ridership data:
source=nyc_taxi | fields value, timestamp | AD time_field='timestamp' | where value=10844.0
The command returns the following results.
| value | timestamp | score | anomaly_grade |
|---|---|---|---|
| 10844.0 | 1404172800000 | 0.0 | 0.0 |
Example 2: Detecting events in New York City from taxi ridership data with non-time-series data
The following example uses batch RCF to detect anomalies in non-time-series data:
source=nyc_taxi | fields value | AD | where value=10844.0
The command returns the following results.
| value | score | anomalous |
|---|---|---|
| 10844.0 | 0.0 | false |
bin
Syntax and examples
The bin command groups numeric values into buckets of equal intervals, making it useful for creating histograms and analyzing data distribution. It takes a numeric or time-based field and generates a new field with values that represent the lower bound of each bucket.
Syntax
bin <field> [span=<interval>] [minspan=<interval>] [bins=<count>] [aligntime=(earliest | latest | <time-specifier>)] [start=<value>] [end=<value>]
The following table describes the parameters for the bin command.
| Field | Description | Required | Default |
|---|---|---|---|
field | The field to bin. Accepts numeric or time-based fields. | Yes | N/A |
span | The interval size for each bin. Cannot be used with bins or minspan parameters. | No | N/A |
minspan | The minimum interval size for automatic span calculation. Cannot be used with span or bins parameters. | No | N/A |
bins | The maximum number of equal-width bins to create. Cannot be used with span or minspan parameters. The bins parameter must be between 2 and 50000 (inclusive). | No | N/A |
aligntime | Align the bin times for time-based fields. Valid only for time-based discretization. | No | N/A |
start | The starting value for the bin range. | No | Minimum field value |
end | The ending value for the bin range. | No | Maximum field value |
Example 1: Basic numeric span
source=accounts | bin age span=10 | fields age, account_number | head 3;
The command returns the following results.
| age | account_number |
|---|---|
| 30-40 | 1 |
| 30-40 | 6 |
| 20-30 | 13 |
Example 2: Logarithmic span (log10)
source=accounts | bin balance span=log10 | fields balance | head 2;
The command returns the following results.
| balance |
|---|
| 10000.0-1000000.0 |
| 1000.0-10000.0 |
Example 3: Basic bins parameter
source=time_test | bin value bins=5 | fields value | head 3;
The command returns the following results.
| value |
|---|
| 8000-9000 |
| 7000-8000 |
| 9000-10000 |
Example 4: High bin count
source=accounts | bin age bins=21 | fields age, account_number | head 3;
The command returns the following results.
| age | account_number |
|---|---|
| 32-33 | 1 |
| 36-37 | 6 |
| 28-29 | 13 |
Example 5: Basic minspan
source=accounts | bin age minspan=5 | fields age, account_number | head 3;
The command returns the following results.
| age | account_number |
|---|---|
| 30-40 | 1 |
| 30-40 | 6 |
| 20-30 | 13 |
Example 6: Span with start/end
source=accounts | bin age span=1 start=25 end=35 | fields age | head 6;
The command returns the following results.
| age |
|---|
| 32-33 |
| 36-37 |
| 28-29 |
| 33-34 |
Example 7: Hour span
source=time_test | bin @timestamp span=1h | fields @timestamp, value | head 3;
The command returns the following results.
| @timestamp | value |
|---|---|
| 2025-07-28 00:00:00 | 8945 |
| 2025-07-28 01:00:00 | 7623 |
| 2025-07-28 02:00:00 | 9187 |
Example 8: Default behavior (no parameters)
source=accounts | bin age | fields age, account_number | head 3;
The command returns the following results.
| age | account_number |
|---|---|
| 32.0-33.0 | 1 |
| 36.0-37.0 | 6 |
| 28.0-29.0 | 13 |
Example 9: Using the bin command with string fields
source=accounts | eval age_str = CAST(age AS STRING) | bin age_str bins=3 | stats count() by age_str | sort age_str;
The command returns the following results.
| count() | age_str |
|---|---|
| 1 | 20-30 |
| 3 | 30-40 |
dedup
Syntax and examples
The dedup (data deduplication) command removes duplicate documents defined by a field from the search result.
Syntax
dedup [int] <field-list> [keepempty=<bool>] [consecutive=<bool>]
The following table describes the parameters for the dedup command.
| Field | Description | Type | Required | Default |
|---|---|---|---|---|
int | Retain the specified number of duplicate events for each combination. The number must be greater than 0. If you do not specify a number, only the first occurring event is kept and all other duplicates are removed from the results. | integer | No | 1 |
keepempty | If true, keep the document if any field in the field list has a null value or a field missing. | boolean | No | False |
consecutive | If true, remove only consecutive events with duplicate combinations of values. | boolean | No | False |
field-list | Specify a comma-delimited field list. At least one field is required. | string | Yes | N/A |
Example 1: Dedup by one field
To remove duplicate documents with the same gender, use the following command:
search source=accounts | dedup gender | fields account_number, gender;
The command returns the following results.
| account_number | gender |
|---|---|
| 1 | M |
| 13 | F |
Example 2: Keep two duplicate documents
To keep two duplicate documents with the same gender, use the following command:
search source=accounts | dedup 2 gender | fields account_number, gender;
The command returns the following results.
| account_number | gender |
|---|---|
| 1 | M |
| 6 | M |
| 13 | F |
Example 3: Keep or ignore an empty field by default
To keep two duplicate documents with a null field value, use the following command:
search source=accounts | dedup email keepempty=true | fields account_number, email;
The command returns the following results.
| account_number | |
|---|---|
| 1 | amberduke@pyrami.com |
| 6 | hattiebond@netagy.com |
| 13 | null |
| 18 | daleadams@boink.com |
To remove duplicate documents with the null field value, use the following command:
search source=accounts | dedup email | fields account_number, email;
| account_number | |
|---|---|
| 1 | amberduke@pyrami.com |
| 6 | hattiebond@netagy.com |
| 18 | daleadams@boink.com |
Example 4: Dedup of consecutive documents
To remove duplicates of consecutive documents, use the following command:
search source=accounts | dedup gender consecutive=true | fields account_number, gender;
The command returns the following results.
| account_number | gender |
|---|---|
| 1 | M |
| 13 | F |
| 18 | M |
Limitations
The dedup command is not rewritten to OpenSearch query domain-specific language (DSL); it is only executed on the coordinating node.
eval
Syntax and examples
The eval command evaluates an expression and appends its result to the search result.
Syntax
eval <field>=<expression> ["," <field>=<expression> ]...
The following table describes the parameters for the eval command.
| Field | Description | Required |
|---|---|---|
field | If a field name does not exist, a new field is added. If the field name already exists, it’s overwritten. | Yes |
expression | Specify any supported expression. | Yes |
Example 1: Create a new field
To create a new doubleAge field for each document where doubleAge is the result of age multiplied by 2, use the following command:
search source=accounts | eval doubleAge = age * 2 | fields age, doubleAge;
The command returns the following results.
| age | doubleAge |
|---|---|
| 32 | 64 |
| 36 | 72 |
| 28 | 56 |
| 33 | 66 |
Example 2: Overwrite the existing field
To overwrite the age field with age plus 1, use the following command:
search source=accounts | eval age = age + 1 | fields age;
The command returns the following results.
| age |
|---|
| 33 |
| 37 |
| 29 |
| 34 |
Example 3: Create a new field with a field defined with the eval command
To create a new field ddAge where ddAge is the result of doubleAge multiplied by 2 and doubleAge is defined in the eval command, use the following command:
search source=accounts | eval doubleAge = age * 2, ddAge = doubleAge * 2 | fields age, doubleAge, ddAge;
The command returns the following results.
| age | doubleAge | ddAge |
|---|---|---|
| 32 | 64 | 128 |
| 36 | 72 | 144 |
| 28 | 56 | 112 |
| 33 | 66 | 132 |
Limitations
The eval command is not rewritten to OpenSearch query DSL; it is only executed on the coordinating node.
fields
Syntax and examples
Use the fields command to keep or remove fields from a search result.
Syntax
fields [+|-] <field-list>
The following table describes the parameters for the fields command.
| Field | Description | Required | Default |
|---|---|---|---|
index | Plus (+) keeps only fields specified in the field list. Minus (-) removes all fields specified in the field list. | No | + |
field-list | Specify a comma-delimited list of fields. | Yes | No default |
Example 1: Select specified fields from result
To get account_number, firstname, and lastname fields from a search result, use the following command:
search source=accounts | fields account_number, firstname, lastname;
The command returns the following results.
| account_number | firstname | lastname |
|---|---|---|
| 1 | Amber | Duke |
| 6 | Hattie | Bond |
| 13 | Nanette | Bates |
| 18 | Dale | Adams |
Example 2: Remove specified fields from a search result
To remove the account_number field from the search results, use the following command:
search source=accounts | fields account_number, firstname, lastname | fields - account_number;
The command returns the following results.
| firstname | lastname |
|---|---|
| Amber | Duke |
| Hattie | Bond |
| Nanette | Bates |
| Dale | Adams |
head
Syntax and examples
Use the head command to return the first N number of results in a specified search order.
Syntax
head [N]
The following table describes the parameters for the head command.
| Field | Description | Required | Default |
|---|---|---|---|
N | Specify the number of results to return. | No | 10 |
Example 1: Get the first 10 results
To get the first 10 results, use the following command:
search source=accounts | fields firstname, age | head;
The command returns the following results.
| firstname | age |
|---|---|
| Amber | 32 |
| Hattie | 36 |
| Nanette | 28 |
Example 2: Get the first N results
To get the first two results, use the following command:
search source=accounts | fields firstname, age | head 2;
The command returns the following results.
| firstname | age |
|---|---|
| Amber | 32 |
| Hattie | 36 |
Limitations
The head command is not rewritten to OpenSearch query DSL; it is only executed on the coordinating node.
join
Syntax and examples
You can combine two datasets using the join command. The left side can be an index or results from piped commands, while the right side can be either an index or a subquery.
This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion on the OpenSearch forum.
Syntax
[join-type] join [left-alias] [right-alias] on <join-criteria> <right-dataset>
The following table describes additional requirements for the join command.
| Parameter | Description | Required |
|---|---|---|
join-criteria | Any comparison expression. | Yes |
right-dataset | Either an index or a subquery with or without an alias. | Yes |
The following table describes the parameters for the join command.
| Field | Description | Type | Required | Default |
|---|---|---|---|---|
join-type | The type of join to perform. Valid values are inner, left, right, full, cross, semi, and anti. | String | No | inner |
left-alias | The subquery alias to use with the left join side in order to avoid ambiguous naming. Fixed pattern: left = <left-alias> | String | No | N/A |
right-alias | The subquery alias to use with the right join side in order to avoid ambiguous naming. Fixed pattern: right = <right-alias> | String | No | N/A |
join-criteria | Any comparison expression. | String | Yes | N/A |
right-dataset | Either an index or a subquery with/without an alias. | String | Yes | N/A |
The following examples use the state_country and occupation indexes.
The state_country index contains the following data.
| Name | Age | State | Country |
|---|---|---|---|
| Jake | 70 | California | USA |
| Hello | 30 | New York | USA |
| John | 25 | Ontario | Canada |
| Jane | 20 | Quebec | Canada |
| Jim | 27 | B.C. | Canada |
| Peter | 57 | B.C. | Canada |
| Rick | 70 | B.C. | Canada |
| David | 40 | Washington | USA |
The occupation index contains the following data.
| Name | Occupation | Country | Salary |
|---|---|---|---|
| Jake | Engineer | England | 100000 |
| Hello | Artist | USA | 70000 |
| John | Doctor | Canada | 120000 |
| David | Doctor | USA | 120000 |
| David | Unemployed | Canada | 0 |
| Jane | Scientist | Canada | 90000 |
Example 1: Join two indexes
The following example performs an inner join between two indexes:
search source = state_country
| inner join left=a right=b ON a.name = b.name occupation
| stats avg(salary) by span(age, 10) as age_span, b.country
The command returns the following results.
| avg(salary) | age_span | b.country |
|---|---|---|
| 120000.0 | 40 | USA |
| 105000.0 | 20 | Canada |
| 0.0 | 40 | Canada |
| 70000.0 | 30 | USA |
| 100000.0 | 70 | England |
Example 2: Join with a subsearch
The following example performs a left join with a subsearch:
search source = state_country as a
| where country = 'USA' OR country = 'England'
| left join on a.name = b.name [
source = occupation
| where salary > 0
| fields name, country, salary
| sort salary
| head 3
] as b
| stats avg(salary) by span(age, 10) as age_span, b.country
The command returns the following results.
| avg(salary) | age_span | b.country |
|---|---|---|
| null | 40 | null |
| 70000.0 | 30 | USA |
| 100000.0 | 70 | England |
Limitations
The join command works only when plugins.calcite.enabled is set to true.
kmeans
Syntax and examples
The kmeans command applies the ML Commons plugin’s k-means algorithm to the provided PPL command’s search results.
Syntax
kmeans <cluster-number>
The following table describes the parameters for the kmeans command.
| Field | Description | Required |
|---|---|---|
cluster-number | The number of clusters you want to group your data points into. | Yes |
Example: Group Iris data
This example shows how to classify three Iris species (Iris setosa, Iris virginica, and Iris versicolor) based on the combination of four features measured from each sample: the length and the width of the sepals and petals:
source=iris_data | fields sepal_length_in_cm, sepal_width_in_cm, petal_length_in_cm, petal_width_in_cm | kmeans 3
The command returns the following results.
| sepal_length_in_cm | sepal_width_in_cm | petal_length_in_cm | petal_width_in_cm | ClusterID | |
|---|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | 1 | |
| 5.6 | 3.0 | 4.1 | 1.3 | 0 | |
| 6.7 | 2.5 | 5.8 | 1.8 | 2 |
lookup
Syntax and examples
The lookup command enriches your search data by adding or replacing data from a lookup index (dimension table). You can extend index fields with values from a dimension table or append/replace values when a lookup condition is matched. As an alternative to the join command, the lookup command is more suitable for enriching the source data with a static dataset.
This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion on the OpenSearch forum.
Syntax
lookup <lookup-index> (<lookup-mapping-field> [as <source-mapping-field>])... [(replace | append) (<input-field> [AS <output-field>])...]
The following table describes the parameters for the lookup command.
| Field | Description | Required | Default |
|---|---|---|---|
lookup-index | The name of lookup index (dimension table). | Yes | N/A |
lookup-mapping-field | A mapping key in the lookup-index, analogous to a join key from the right table. You can specify multiple lookup-mapping-field values with commas. | Yes | N/A |
source-mapping-field | A mapping key from the source (left side), analogous to a join key from the left side. | No | lookup-mapping-field |
replace | append | The output strategies. When specifying replace, matched values in the lookup-index field overwrite the values in the results. If you specify append, matched values in the lookup-index field only append to the missing values in the results. | No | replace |
input-field | A field in lookup-index where matched values are applied to the result output. You can specify multiple input-field values with commas. If you don’t specify any input-field, all fields except lookup-mapping-field from lookup-index are matched values that are applied to the result output. | No | N/A |
output-field | A field of output. You can specify zero or multiple output-field values. If you specify output-field with an existing field name in the source query, its values are replaced or appended by the matched values from input-field. If the field specified in output-field is a new field, an extended new field is applied to the results. | No | input-field |
The following examples use the workers and work_information indexes.
The workers index contains the following data.
| ID | Name | Occupation | Country | Salary |
|---|---|---|---|---|
| 1000 | Jake | Engineer | England | 100000 |
| 1001 | Hello | Artist | USA | 70000 |
| 1002 | John | Doctor | Canada | 120000 |
| 1003 | David | Doctor | N/A | 120000 |
| 1004 | David | N/A | Canada | 0 |
| 1005 | Jane | Scientist | Canada | 90000 |
The work_information index contains the following data.
| UID | Name | Department | Occupation |
|---|---|---|---|
| 1000 | Jake | IT | Engineer |
| 1002 | John | DATA | Scientist |
| 1003 | David | HR | Doctor |
| 1005 | Jane | DATA | Engineer |
| 1006 | Tom | SALES | Artist |
Example 1: Look up workers and return the corresponding department
The following example looks up workers and returns the corresponding department:
source = workers | lookup work_information uid as id append department
The command returns the following results.
| id | name | occupation | country | salary | department |
|---|---|---|---|---|---|
| 1000 | Jake | Engineer | England | 100000 | IT |
| 1001 | Hello | Artist | USA | 70000 | Null |
| 1002 | John | Doctor | Canada | 120000 | DATA |
| 1003 | David | Doctor | Null | 120000 | HR |
| 1004 | David | Null | Canada | 0 | Null |
| 1005 | Jane | Scientist | Canada | 90000 | DATA |
Example 2: Look up workers and replace their occupation and department
The following example looks up workers and replaces their occupation and department using their work_information:
source = workers | lookup work_information uid as id, name
The command returns the following results.
| id | name | occupation | country | salary | department |
|---|---|---|---|---|---|
| 1000 | Jake | Engineer | England | 100000 | IT |
| 1001 | Hello | null | USA | 70000 | null |
| 1002 | John | Scientist | Canada | 120000 | DATA |
| 1003 | David | Doctor | null | 120000 | HR |
| 1004 | David | null | Canada | 0 | null |
| 1005 | Jane | Engineer | Canada | 90000 | DATA |
Example 3: Look up workers and create a new occupation field
The following example looks up workers and appends their occupation from work_information as a new field:
source = workers | lookup work_information name replace occupation as new_occupation
The command returns the following results.
| id | name | occupation | country | salary | new_occupation |
|---|---|---|---|---|---|
| 1000 | Jake | Engineer | England | 100000 | Engineer |
| 1001 | Hello | Artist | USA | 70000 | null |
| 1002 | John | Doctor | Canada | 120000 | Scientist |
| 1003 | David | Doctor | null | 120000 | Doctor |
| 1004 | David | null | Canada | 0 | Doctor |
| 1005 | Jane | Scientist | Canada | 90000 | Engineer |
Limitations
The lookup command works only when plugins.calcite.enabled is set to true.
parse
Syntax and examples
Use the parse command to parse a text field using a regular expression and append the result to the search result.
Syntax
parse <field> <regular-expression>
The following table describes the parameters for the parse command.
| Field | Description | Required |
|---|---|---|
field | A text field. | Yes |
regular-expression | The regular expression used to extract new fields from the given text field. If a new field name exists, it replaces the original field. | Yes |
The regular expression is used to match the whole text field of each document with the Java regex engine. Each named capture group in the expression becomes a new STRING field.
Example 1: Create a new field
The following example shows how to create a new field host for each document. host is the hostname after @ in the email field. Parsing a null field returns an empty string:
source=accounts | parse email '.+@(?<host>.+)' | fields email, host ;
The command returns the following results.
| host | |
|---|---|
| amberduke@pyrami.com | pyrami.com |
| hattiebond@netagy.com | netagy.com |
| null | null |
| daleadams@boink.com | boink.com |
Example 2: Override the existing field
The following example shows how to override the existing address field with the street number removed:
source=accounts | parse address '\d+ (?<address>.+)' | fields address ;
The command returns the following results.
| address |
|---|
| Holmes Lane |
| Bristol Street |
| Madison Street |
| Hutchinson Court |
Example 3: Filter and sort by a cast-parsed field
The following example shows how to sort street numbers that are higher than 500 in the address field:
source=accounts | parse address '(?<streetNumber>\d+) (?<street>.+)' | where cast(streetNumber as int) > 500 | sort num(streetNumber) | fields streetNumber, street ;
The command returns the following results.
| streetNumber | street |
|---|---|
| 671 | Bristol Street |
| 789 | Madison Street |
| 880 | Holmes Lane |
Limitations
A few limitations exist when using the parse command:
- Fields defined by
parsecannot be parsed again. For example,source=accounts | parse address '\d+ (?<street>.+)' | parse street '\w+ (?<road>\w+)' ;fails to return any expressions. - Fields defined by
parsecannot be overridden with other commands. For example, when enteringsource=accounts | parse address '\d+ (?<street>.+)' | eval street='1' | where street='1' ;wheredoes not match any documents sincestreetcannot be overridden. - The text field used by
parsecannot be overridden. For example, when enteringsource=accounts | parse address '\d+ (?<street>.+)' | eval address='1' ;,streetis not parsed since the address is overridden. - Fields defined by
parsecannot be filtered/sorted after using them in thestatscommand. For example,source=accounts | parse email '.+@(?<host>.+)' | stats avg(age) by host | where host=pyrami.com ;wheredoes not match the domain listed.
rare
Syntax and examples
Use the rare command to find the least common values of all fields in a field list. A maximum of 10 results are returned for each distinct set of values of the group-by fields.
Syntax
rare <field-list> [by-clause]
The following table describes the parameters for the rare command.
| Field | Description | Required |
|---|---|---|
field-list | Specify a comma-delimited list of field names. | No |
by-clause | Specify one or more fields to group the results by. | No |
Example 1: Find the least common values in a field
To find the least common values of gender, use the following command:
search source=accounts | rare gender;
The command returns the following results.
| gender |
|---|
| F |
| M |
Example 2: Find the least common values grouped by gender
To find the least common age grouped by gender, use the following command:
search source=accounts | rare age by gender;
The command returns the following results.
| gender | age |
|---|---|
| F | 28 |
| M | 32 |
| M | 33 |
Limitations
The rare command is not rewritten to OpenSearch query DSL; it is only executed on the coordinating node.
regex
Syntax and examples
The regex command filters search results by matching field values against a regular expression pattern. Only documents in which the specified field matches the pattern are included in the results.
Syntax
regex <field> = <pattern>
regex <field> != <pattern>
The following table describes the parameters for the regex command.
| Field | Description | Required |
|---|---|---|
field | The field name to match against. | Yes |
pattern | The regular expression pattern to match. Supports Java regex syntax, including named groups, lookahead/lookbehind, and character classes. | Yes |
Example 1: Basic pattern matching
The following example shows how to filter documents where the lastname field matches names starting with uppercase letters:
source=accounts | regex lastname="^[A-Z][a-z]+$" | fields account_number, firstname, lastname;
The command returns the following results.
| account_number | firstname | lastname |
|---|---|---|
| 1 | Amber | Duke |
| 6 | Hattie | Bond |
| 13 | Nanette | Bates |
| 18 | Dale | Adams |
Example 2: Negative matching
The following example shows how to exclude documents where the lastname field ends with “son”:
source=accounts | regex lastname!=".*son$" | fields account_number, lastname;
The command returns the following results.
| account_number | lastname |
|---|---|
| 1 | Duke |
| 6 | Bond |
| 13 | Bates |
| 18 | Adams |
Example 3: Email domain matching
The following example shows how to filter documents by email domain patterns:
source=accounts | regex email="@pyrami\.com$" | fields account_number, email;
The command returns the following results.
| account_number | |
|---|---|
| 1 | amberduke@pyrami.com |
Example 4: Complex patterns with character classes
The following example shows how to use complex regex patterns with character classes and quantifiers:
source=accounts | regex address="\d{3,4}\s+[A-Z][a-z]+\s+(Street|Lane|Court)" | fields account_number, address;
The command returns the following results.
| account_number | address |
|---|---|
| 1 | 880 Holmes Lane |
| 6 | 671 Bristol Street |
| 13 | 789 Madison Street |
| 18 | 467 Hutchinson Court |
Example 5: Case-sensitive matching
The following example demonstrates that regex matching is case-sensitive by default:
source=accounts | regex state="va" | fields account_number, state;
The command returns the following results.
| account_number | state | :— | :—
source=accounts | regex state="VA" | fields account_number, state;
The command returns the following results.
| account_number | state |
|---|---|
| 13 | VA |
Limitations
- Field specification required: A field name must be specified in the
regexcommand. Pattern-only syntax (for example,regex "pattern") is not currently supported. - String fields only: The
regexcommand currently only supports string fields. Using it on numeric or Boolean fields results in an error.
rename
Syntax and examples
Use the rename command to rename one or more fields in the search result.
Syntax
rename <source-field> AS <target-field>["," <source-field> AS <target-field>]...
The following table describes the parameters for the rename command.
| Field | Description | Required |
|---|---|---|
source-field | The name of the field that you want to rename. | Yes |
target-field | The name you want to rename to. | Yes |
Example 1: Rename one field
To rename the account_number field as an, use the following command:
search source=accounts | rename account_number as an | fields an;
The command returns the following results.
| an |
|---|
| 1 |
| 6 |
| 13 |
| 18 |
Example 2: Rename multiple fields
To rename the account_number field as an and employer as emp, use the following command:
search source=accounts | rename account_number as an, employer as emp | fields an, emp;
The command returns the following results.
| an | emp |
|---|---|
| 1 | Pyrami |
| 6 | Netagy |
| 13 | Quility |
| 18 | null |
Limitations
The rename command is not rewritten to OpenSearch query DSL; it is only executed on the coordinating node.
rex
Syntax and examples
The rex command extracts fields from a raw text field using regular expression named capture groups.
Syntax
rex [mode=<mode>] field=<field> <pattern> [max_match=<int>] [offset_field=<string>]
The following table describes the parameters for the rex command.
| Field | Description | Required | Default |
|---|---|---|---|
field | The field must be a string field to extract data from. | Yes | N/A |
pattern | The regular expression pattern with named capture groups used to extract new fields. The pattern must contain at least one named capture group using (?<name>pattern) syntax. | Yes | N/A |
mode | Either extract, which creates new fields from regular expression named capture groups, or sed, which performs text substitution on the field using sed-style patterns. | No | extract |
max_match | The maximum number of matches to extract. If greater than 1, extracted fields become arrays. The value 0 means unlimited matches, but is automatically capped to the configured limit (default: 10, configurable using plugins.ppl.rex.max_match.limit). | No | 1 |
offset_field | The field name used to store the character offset positions of matches. Only available in extract mode. | No | N/A |
Example 1: Basic field extraction
The following example shows how to extract the username and domain from email addresses using named capture groups. Both extracted fields are returned as a string type:
source=accounts | rex field=email "(?<username>[^@]+)@(?<domain>[^.]+)" | fields email, username, domain | head 2;
The command returns the following results.
| username | domain | |
|---|---|---|
| amberduke@pyrami.com | amberduke | pyrami |
| hattiebond@netagy.com | hattiebond | netagy |
Example 2: Handling non-matching patterns
The following example shows the rex command returning all events, setting extracted fields to null for non-matching patterns. Extracted fields are of a string type when matches are found:
source=accounts | rex field=email "(?<user>[^@]+)@(?<domain>gmail\\.com)" | fields email, user, domain | head 2;
The command returns the following results.
| user | domain | |
|---|---|---|
| amberduke@pyrami.com | null | null |
| hattiebond@netagy.com | null | null |
Example 3: Multiple matches with max_match
The following example shows how to extract multiple words from the address field using the max_match parameter. The extracted field is returned as an array type containing string elements:
source=accounts | rex field=address "(?<words>[A-Za-z]+)" max_match=2 | fields address, words | head 3;
The command returns the following results.
| address | words |
|---|---|
| 880 Holmes Lane | [Holmes,Lane] |
| 671 Bristol Street | [Bristol,Street] |
| 789 Madison Street | [Madison,Street] |
Example 4: Text replacement with mode=sed
The following example shows how to replace email domains using sed mode for text substitution. The extracted field is returned as a string type:
source=accounts | rex field=email mode=sed "s/@.*/@company.com/" | fields email | head 2;
The command returns the following results.
| amberduke@company.com |
| hattiebond@company.com |
Example 5: Using offset_field
The following example shows how to track the character positions where matches occur. Extracted fields are of a string type, and the offset_field is also of a string type:
source=accounts | rex field=email "(?<username>[^@]+)@(?<domain>[^.]+)" offset_field=matchpos | fields email, username, domain, matchpos | head 2;
The command returns the following results.
| username | domain | matchpos | |
|---|---|---|---|
| amberduke@pyrami.com. | amberduke | pyrami | domain=10-15&username=0-8 |
| hattiebond@netagy.com | hattiebond | netagy | domain=11-16&username=0-9 |
Limitations
Named Capture Group Naming:
- Group names must start with a letter and contain only letters and digits.
- For detailed Java regex pattern syntax and usage, refer to the official Java Pattern documentation.
Pattern requirements:
- The pattern must contain at least one named capture group.
- Regular capture groups
(...)without names are not allowed.
Max match limit:
- The
max_matchparameter is subject to a configurable system limit to prevent memory exhaustion. - When
max_match=0(unlimited) is specified, it is automatically capped at the configured limit (default: 10). - User-specified values exceeding the configured limit result in an error.
- Users can adjust the limit using the
plugins.ppl.rex.max_match.limitcluster setting. Setting this limit to a large value is not recommended because it can lead to excessive memory consumption, especially with patterns that match empty strings (for example,\d*,\w*).
sort
Syntax and examples
Use the sort command to sort search results by a specified field.
Syntax
sort [count] <[+|-] sort-field>...
The following table describes the parameters for the sort command.
| Field | Description | Required | Default |
|---|---|---|---|
count | The maximum number of results to return from the sorted result. If count=0, all results are returned. | No | 1000 |
[+|-] | Use plus [+] to sort by ascending order and minus [-] to sort by descending order. | No | Ascending order |
sort-field | Specify the field that you want to sort by. | Yes | N/A |
Example 1: Sort by one field
To sort all documents by the age field in ascending order, use the following command:
search source=accounts | sort age | fields account_number, age;
The command returns the following results.
| account_number | age |
|---|---|
| 13 | 28 |
| 1 | 32 |
| 18 | 33 |
| 6 | 36 |
Example 2: Sort by one field and return all results
To sort all documents by the age field in ascending order and specify count as 0 to return all results, use the following command:
search source=accounts | sort 0 age | fields account_number, age;
The command returns the following results.
| account_number | age |
|---|---|
| 13 | 28 |
| 1 | 32 |
| 18 | 33 |
| 6 | 36 |
Example 3: Sort by one field in descending order
To sort all documents by the age field in descending order, use the following command:
search source=accounts | sort - age | fields account_number, age;
| account_number | age |
|---|---|
| 6 | 36 |
| 18 | 33 |
| 1 | 32 |
| 13 | 28 |
Example 4: Specify the number of sorted documents to return
To sort all documents by the age field in ascending order and specify count as 2 to return two results, use the following command:
search source=accounts | sort 2 age | fields account_number, age;
The command returns the following results.
| account_number | age |
|---|---|
| 13 | 28 |
| 1 | 32 |
Example 5: Sort by multiple fields
To sort all documents by the gender field in ascending order and the age field in descending order, use the following command:
search source=accounts | sort + gender, - age | fields account_number, gender, age;
The command returns the following results.
| account_number | gender | age |
|---|---|---|
| 13 | F | 28 |
| 6 | M | 36 |
| 18 | M | 33 |
| 1 | M | 32 |
spath
Syntax and examples
The spath command allows you to extract fields from structured text data. It currently allows selecting from JSON data with JSON paths.
Syntax
spath input=<field> [output=<field>] [path=]<path>
The following table describes the parameters for the spath command.
| Field | Description | Required | Default |
|---|---|---|---|
input | The field to scan for JSON data. | Yes | N/A |
output | The destination field that the data is loaded to. | No | Value of path |
path | The path of the data to load for the object. | Yes | N/A |
Example 1: Simple field extraction
The simplest spath is to extract a single field. The following example extracts n from the doc field of type text:
source=structured | spath input=doc_n n | fields doc_n n;
The command returns the following results.
| doc_n | n |
|---|---|
| {“n”: 1} | 1 |
| {“n”: 2} | 2 |
| {“n”: 3} | 3 |
Example 2: Lists and nesting
The following example demonstrates additional JSON path use cases, such as traversing nested fields and extracting list elements:
source=structured | spath input=doc_list output=first_element list{0} | spath input=doc_list output=all_elements list{} | spath input=doc_list output=nested nest_out.nest_in | fields doc_list first_element all_elements nested;
| doc_list | first_element | all_elements | nested |
|---|---|---|---|
| {“list”: [1, 2, 3, 4], “nest_out”: {“nest_in”: “a”}} | 1 | [1,2,3,4] | a |
| {“list”: [], “nest_out”: {“nest_in”: “a”}} | null | [] | a |
| {“list”: [5, 6], “nest_out”: {“nest_in”: “a”}} | 5 | [5,6] | a |
Example 3: Sum of inner elements
The following example shows how to extract an inner field and generate statistics on it, using the documents from the first example. It also demonstrates that spath always returns strings for inner types:
source=structured | spath input=doc_n n | eval n=cast(n as int) | stats sum(n) | fields `sum(n)`;
The command returns the following results.
| sum(n) |
|---|
| 6 |
Example 4: Escaped paths
spath can escape paths with strings to accept any path that json_extract does. This includes escaping complex field names as array components:
source=structured | spath output=a input=doc_escape "['a fancy field name']" | spath output=b input=doc_escape "['a.b.c']" | fields a b;
The command returns the following results.
| a | b |
|---|---|
| true | 0 |
| true | 1 |
| false | 2 |
stats
Syntax and examples
Use the stats command to aggregate data from search results.
The following table lists the aggregation functions and also indicates how each one handles null or missing values.
| Function | NULL | MISSING |
|---|---|---|
COUNT | Not counted | Not counted |
SUM | Ignore | Ignore |
AVG | Ignore | Ignore |
MAX | Ignore | Ignore |
MIN | Ignore | Ignore |
Syntax
stats <aggregation>... [by-clause]...
The following table describes the parameters for the stats command.
| Field | Description | Required | Default |
|---|---|---|---|
aggregation | Specify a statistical aggregation function. The argument of this function must be a field. | Yes | N/A |
by-clause | Specify one or more fields to group the results by. If not specified, the stats command returns only one row, which is the aggregation over the entire result set. | No | N/A |
Example 1: Calculate the average value of a field
To calculate the average age of all documents, use the following command:
search source=accounts | stats avg(age);
The command returns the following results.
| avg(age) |
|---|
| 32.25 |
Example 2: Calculate the average value of a field by group
To calculate the average age grouped by gender, use the following command:
search source=accounts | stats avg(age) by gender;
The command returns the following results.
| gender | avg(age) |
|---|---|
| F | 28.0 |
| M | 33.666666666666664 |
Example 3: Calculate the average and sum of a field by group
To calculate the average and sum of age grouped by gender, use the following command:
search source=accounts | stats avg(age), sum(age) by gender;
The command returns the following results.
| gender | avg(age) | sum(age) |
|---|---|---|
| F | 28 | 28 |
| M | 33.666666666666664 | 101 |
Example 4: Calculate the maximum value of a field
To calculate the maximum age, use the following command:
search source=accounts | stats max(age);
| max(age) |
|---|
| 36 |
Example 5: Calculate the maximum and minimum value of a field by group
To calculate the maximum and minimum age values grouped by gender, use the following command:
search source=accounts | stats max(age), min(age) by gender;
The command returns the following results.
| gender | min(age) | max(age) |
|---|---|---|
| F | 28 | 28 |
| M | 32 | 36 |
timechart
Syntax and examples
The timechart command creates a time-based aggregation of data. It groups data by time intervals and optionally by a field, then applies an aggregation function to each group. The results are returned in an unpivoted format with separate rows for each time-field combination.
Syntax
timechart [timefield=<field_name>] [span=<time_interval>] [limit=<number>] [useother=<boolean>] <aggregation_function> [by <field>]
The following table describes the parameters for the timechart command.
| Field | Description | Required | Default |
|---|---|---|---|
timefield | The field to use for time-based grouping. Must be a timestamp field. | No | @timestamp |
span | Specifies the time interval for grouping data. | No | 1m |
limit | Specifies the maximum number of distinct values to display when using the “by” clause. | No | 10 |
useother | Controls whether to create an “OTHER” category for values beyond the limit. | No | true |
aggregation_function | The aggregation function to apply to each time bucket. | Yes | N/A |
by | Groups the results by the specified field in addition to time intervals. If not specified, the aggregation is performed across all documents in each time interval. | No | N/A |
Example 1: Count events by hour
The following example counts events for each hour and groups them by host:
source=events | timechart span=1h count() by host;
The command returns the following results.
| @timestamp | host | count() |
|---|---|---|
| 2023-01-01 10:00:00 | server1 | 4 |
| 2023-01-01 10:00:00 | server2 | 4 |
Example 2: Calculate average number of packets by minute
The following example calculates the average packets for each minute without grouping by any field:
source=events | timechart span=1m avg(packets);
The command returns the following results.
| @timestamp | avg(packets) |
|---|---|
| 2023-01-01 10:00:00 | 60.0 |
| 2023-01-01 10:05:00 | 30.0 |
| 2023-01-01 10:10:00 | 60.0 |
| 2023-01-01 10:15:00 | 30.0 |
| 2023-01-01 10:20:00 | 60.0 |
| 2023-01-01 10:25:00 | 30.0 |
| 2023-01-01 10:30:00 | 180.0 |
| 2023-01-01 10:35:00 | 90.0 |
Example 3: Calculate average number of packets by every 20 minutes and status
The following example calculates the average number of packets for every 20 minutes and groups them by status:
source=events | timechart span=20m avg(packets) by status;
The command returns the following results.
| @timestamp | status | avg(packets) |
|---|---|---|
| 2023-01-01 10:00:00 | active | 30.0 |
| 2023-01-01 10:05:00 | inactive | 30.0 |
| 2023-01-01 10:10:00 | pending | 60.0 |
| 2023-01-01 10:15:00 | processing | 60.0 |
| 2023-01-01 10:20:00 | cancelled | 180.0 |
| 2023-01-01 10:25:00 | completed | 60.0 |
| 2023-01-01 10:30:00 | inactive | 90.0 |
| 2023-01-01 10:35:00 | pending | 30.0 |
Example 4: Using the limit parameter with the count() function
When there are many distinct values in the “by” field, the timechart command displays the top values based on the limit parameter and groups the rest into an “OTHER” category. The following query displays the top 2 hosts with the highest count values and groups the remaining hosts into an “OTHER” category:
source=events | timechart span=1m limit=2 count() by host;
| @timestamp | host | count() |
|---|---|---|
| 2023-01-01 10:00:00 | server1 | 1 |
| 2023-01-01 10:05:00 | server2 | 1 |
| 2023-01-01 10:10:00 | server1 | 1 |
| 2023-01-01 10:15:00 | server2 | 1 |
| 2023-01-01 10:20:00 | server1 | 1 |
| 2023-01-01 10:25:00 | server2 | 1 |
| 2023-01-01 10:30:00 | server1 | 1 |
| 2023-01-01 10:35:00 | server2 | 1 |
Example 5: Using limit=0 with count() to show all values
To display all distinct values without any limit, set limit=0 and use the following command:
source=events_many_hosts | timechart span=1h limit=0 count() by host;
The command returns the following results.
| @timestamp | host | count() |
|---|---|---|
| 2024-07-01 00:00:00 | web-01 | 1 |
| 2024-07-01 00:00:00 | web-02 | 1 |
| 2024-07-01 00:00:00 | web-03 | 1 |
| 2024-07-01 00:00:00 | web-04 | 1 |
| 2024-07-01 00:00:00 | web-05 | 1 |
| 2024-07-01 00:00:00 | web-06 | 1 |
| 2024-07-01 00:00:00 | web-07 | 1 |
| 2024-07-01 00:00:00 | web-08 | 1 |
| 2024-07-01 00:00:00 | web-09 | 1 |
| 2024-07-01 00:00:00 | web-10 | 1 |
| 2024-07-01 00:00:00 | web-11 | 1 |
Example 6: Using useother=false with the count() function
The following example displays the top 10 hosts without the OTHER category (useother=false):
source=events_many_hosts | timechart span=1h useother=false count() by host;
The command returns the following results.
| @timestamp | host | count() |
|---|---|---|
| 2024-07-01 00:00:00 | web-01 | 1 |
| 2024-07-01 00:00:00 | web-02 | 1 |
| 2024-07-01 00:00:00 | web-03 | 1 |
| 2024-07-01 00:00:00 | web-04 | 1 |
| 2024-07-01 00:00:00 | web-05 | 1 |
| 2024-07-01 00:00:00 | web-06 | 1 |
| 2024-07-01 00:00:00 | web-07 | 1 |
| 2024-07-01 00:00:00 | web-08 | 1 |
| 2024-07-01 00:00:00 | web-09 | 1 |
| 2024-07-01 00:00:00 | web-10 | 1 |
Example 7: Using the limit parameter with the useother parameter and the avg() function
The following example displays the top 3 hosts with the OTHER category (default is useother=true):
source=events_many_hosts | timechart span=1h limit=3 avg(cpu_usage) by host;
The command returns the following results.
| @timestamp | host | avg(cpu_usage) |
|---|---|---|
| 2024-07-01 00:00:00 | OTHER | 41.3 |
| 2024-07-01 00:00:00 | web-03 | 55.3 |
| 2024-07-01 00:00:00 | web-07 | 48.6 |
| 2024-07-01 00:00:00 | web-09 | 67.8 |
Example 8: Handling null values in the “by” field
The following example shows how null values in the “by” field are treated as a separate category. The dataset events_null has 1 entry that does not have a host field. It is put into a separate “NULL” category because the defaults for usenull and nullstr are true and "NULL", respectively:
source=events_null | timechart span=1h count() by host;
The command returns the following results.
| @timestamp | host | count() |
|---|---|---|
| 2024-07-01 00:00:00 | NULL | 1 |
| 2024-07-01 00:00:00 | db-01 | 1 |
| 2024-07-01 00:00:00 | web-01 | 2 |
| 2024-07-01 00:00:00 | web-02 | 2 |
Example 9: Calculate packets per second rate
The following example calculates the per-second packet rate for network traffic data using the per_second() function:
source=events | timechart span=30m per_second(packets) by host;
The command returns the following results.
| @timestamp | host | per_second(packets) |
|---|---|---|
| 2024-07-01 00:00:00 | server1 | 0.1 |
| 2024-07-01 00:00:00 | server2 | 0.05 |
| 2024-07-01 00:00:00 | server1 | 0.1 |
| 2024-07-01 00:00:00 | server2 | 0.05 |
Limitations
- Only a single aggregation function is supported per
timechartcommand. - The
binsparameter and other bin options are not supported in thetimechartcommand. Use thespanparameter to control time intervals.
top
Syntax and examples
Use the top command to find the most common values of all fields in the field list.
Syntax
top [N] <field-list> [by-clause]
The following table describes the parameters for the top command.
| Field | Description | Required | Default |
|---|---|---|---|
N | Specify the number of results to return. | No | 10 |
field-list | Specify a comma-delimited list of field names. | Yes | N/A |
by-clause | Specify one or more fields to group the results by. | No | N/A |
Example 1: Find the most common values in a field
To find the most common genders, use the following command:
search source=accounts | top gender;
The command returns the following results.
| gender |
|---|
| M |
| F |
Example 2: Find the most common value in a field
To find the most common gender, use the following command:
search source=accounts | top 1 gender;
The command returns the following results.
| gender |
|---|
| M |
Example 3: Find the most common values grouped by gender
To find the most common age grouped by gender, use the following command:
search source=accounts | top 1 age by gender;
The command returns the following results.
| gender | age |
|---|---|
| F | 28 |
| M | 32 |
Limitations
The top command is not rewritten to OpenSearch query DSL; it is only executed on the coordinating node.
where
Syntax and examples
Use the where command with a Boolean expression to filter the search result. The where command only returns the result when the Boolean expression evaluates to true.
Syntax
where <boolean-expression>
The following table describes the parameters for the where command.
| Field | Description | Required |
|---|---|---|
boolean-expression | An expression that evaluates to a Boolean value. | No |
Example: Filter the result set with a condition
To get all documents from the accounts index where account_number is 1 or gender is F, use the following command:
search source=accounts | where account_number=1 or gender="F" | fields account_number, gender;
The command returns the following results.
| account_number | gender |
|---|---|
| 1 | M |
| 13 | F |