You're viewing version 3.4 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.
join
The join command combines two datasets. The left side can be an index or the results of piped commands, while the right side can be either an index or a subsearch.
Syntax
The join command supports basic and extended syntax options.
Basic syntax
[joinType] join [left = <leftAlias>] [right = <rightAlias>] (on | where) <joinCriteria> <right-dataset>
When using aliases, left must appear before right.
The following are examples of the basic join command syntax:
source = table1 | inner join left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | inner join left = l right = r where l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | left join left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | right join left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | full left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | cross join left = l right = r on 1=1 table2
source = table1 | left semi join left = l right = r on l.a = r.a table2
source = table1 | left anti join left = l right = r on l.a = r.a table2
source = table1 | join left = l right = r [ source = table2 | where d > 10 | head 5 ]
source = table1 | inner join on table1.a = table2.a table2 | fields table1.a, table2.a, table1.b, table1.c
source = table1 | inner join on a = c table2 | fields a, b, c, d
source = table1 as t1 | join left = l right = r on l.a = r.a table2 as t2 | fields l.a, r.a
source = table1 as t1 | join left = l right = r on l.a = r.a table2 as t2 | fields t1.a, t2.a
source = table1 | join left = l right = r on l.a = r.a [ source = table2 ] as s | fields l.a, s.a
Basic syntax parameters
The basic join syntax supports the following parameters.
| Parameter | Required/Optional | Description |
|---|---|---|
<joinCriteria> | Required | A comparison expression specifying how to join the datasets. Must be placed after the on or where keyword in the query. |
<right-dataset> | Required | The right dataset, which can be an index or a subsearch, with or without an alias. |
joinType | Optional | The type of join to perform. Valid values are left, semi, anti, and performance-sensitive types (right, full, and cross). Default is inner. |
left | Optional | An alias for the left dataset (typically a subsearch) used to avoid ambiguous field names. Specify as left = <leftAlias>. |
right | Optional | An alias for the right dataset (typically, a subsearch) used to avoid ambiguous field names. Specify as right = <rightAlias>. |
Extended syntax
join [type=<joinType>] [overwrite=<bool>] [max=n] (<join-field-list> | [left = <leftAlias>] [right = <rightAlias>] (on | where) <joinCriteria>) <right-dataset>
The following are examples of the extended join command syntax:
source = table1 | join type=outer left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | join type=left left = l right = r where l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | join type=inner max=1 left = l right = r where l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | join a table2 | fields a, b, c
source = table1 | join a, b table2 | fields a, b, c
source = table1 | join type=outer a b table2 | fields a, b, c
source = table1 | join type=inner max=1 a, b table2 | fields a, b, c
source = table1 | join type=left overwrite=false max=0 a, b [source=table2 | rename d as b] | fields a, b, c
Extended syntax parameters
The extended join syntax supports the following parameters.
| Parameter | Required/Optional | Description |
|---|---|---|
<joinCriteria> | Required | A comparison expression specifying how to join the datasets. Must be placed after the on or where keyword in the query. |
<right-dataset> | Required | The right dataset, which can be an index or a subsearch, with or without an alias. |
type | Optional | The join type when using extended syntax. Valid values are left, outer (same as left), semi, anti, and performance-sensitive types (right, full, and cross). Default is inner. |
<join-field-list> | Optional | A list of fields used to build the join criteria. These fields must exist in both datasets. If not specified, all fields common to both datasets are used as join keys. |
overwrite | Optional | Applicable only when join-field-list is specified. Specifies whether fields from the right dataset with duplicate names should replace corresponding fields in the main search results. Default is true. |
max | Optional | The maximum number of subsearch results to join with each row in the main search. Default is 0 (unlimited). |
left | Optional | An alias for the left dataset (typically a subsearch) used to avoid ambiguous field names. Specify as left = <leftAlias>. |
right | Optional | An alias for the right dataset (typically, a subsearch) used to avoid ambiguous field names. Specify as right = <rightAlias>. |
Configuration
The join command behavior is configured using the plugins.ppl.join.subsearch_maxout setting, which specifies the maximum number of rows from the subsearch to join against. Default is 50000. A value of 0 indicates that the restriction is unlimited.
To update the setting, send the following request:
PUT /_plugins/_query/settings
{
"persistent": {
"plugins.ppl.join.subsearch_maxout": "5000"
}
}
Example 1: Join two indexes
The following query uses the basic join syntax to join two indexes:
source = state_country
| inner join left=a right=b ON a.name = b.name occupation
| stats avg(salary) by span(age, 10) as age_span, b.country
The query returns the following results:
| avg(salary) | age_span | b.country |
|---|---|---|
| 120000.0 | 40 | USA |
| 105000.0 | 20 | Canada |
| 0.0 | 40 | Canada |
| 70000.0 | 30 | USA |
| 100000.0 | 70 | England |
Example 2: Join with a subsearch
The following query combines a dataset with a subsearch using the basic join syntax:
source = state_country as a
| where country = 'USA' OR country = 'England'
| left join ON a.name = b.name [ source = occupation
| where salary > 0
| fields name, country, salary
| sort salary
| head 3 ] as b
| stats avg(salary) by span(age, 10) as age_span, b.country
The query returns the following results:
| avg(salary) | age_span | b.country |
|---|---|---|
| null | 40 | null |
| 70000.0 | 30 | USA |
| 100000.0 | 70 | England |
Example 3: Join using a field list
The following query uses the extended syntax and specifies a list of fields for the join criteria:
source = state_country
| where country = 'USA' OR country = 'England'
| join type=left overwrite=true name [ source = occupation
| where salary > 0
| fields name, country, salary
| sort salary
| head 3 ]
| stats avg(salary) by span(age, 10) as age_span, country
The query returns the following results:
| avg(salary) | age_span | country |
|---|---|---|
| null | 40 | null |
| 70000.0 | 30 | USA |
| 100000.0 | 70 | England |
Example 4: Join with additional options
The following query uses the extended syntax and optional parameters for more control over the join operation:
source = state_country
| join type=inner overwrite=false max=1 name occupation
| stats avg(salary) by span(age, 10) as age_span, country
The query returns the following results:
| avg(salary) | age_span | country |
|---|---|---|
| 120000.0 | 40 | USA |
| 100000.0 | 70 | USA |
| 105000.0 | 20 | Canada |
| 70000.0 | 30 | USA |
Limitations
The join command has the following limitations:
-
Field name ambiguity in basic syntax – When fields from the left and right datasets share the same name, the field names in the output are ambiguous. To resolve this, conflicting fields are renamed to
<alias>.id(or<tableName>.idif no alias is specified).The following table demonstrates how field name conflicts are resolved when both
table1andtable2contain a field namedid.Query Output source=table1 \| join left=t1 right=t2 on t1.id=t2.id table2 \| eval a = 1t1.id, t2.id, asource=table1 \| join on table1.id=table2.id table2 \| eval a = 1table1.id, table2.id, asource=table1 \| join on table1.id=t2.id table2 as t2 \| eval a = 1table1.id, t2.id, asource=table1 \| join right=tt on table1.id=t2.id [ source=table2 as t2 \| eval b = id ] \| eval a = 1table1.id, tt.id, tt.b, a -
Field deduplication in extended syntax – When using the extended syntax with a field list, duplicate field names in the output are deduplicated according to the
overwriteoption. -
Join type availability – The join types
inner,left,outer(alias ofleft),semi, andantiare enabled by default. The performance-sensitive join typesright,full, andcrossare disabled by default. To enable these types, setplugins.calcite.all_join_types.allowedtotrue.