join

The join command combines two datasets. The left side can be an index or the results of piped commands, while the right side can be either an index or a subsearch.

Syntax

The join command supports basic and extended syntax options.

Basic syntax

[joinType] join [left = <leftAlias>] [right = <rightAlias>] (on | where) <joinCriteria> <right-dataset>

When using aliases, left must appear before right.

The following are examples of the basic join command syntax:

source = table1 | inner join left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | inner join left = l right = r where l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | left join left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | right join left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | full left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | cross join left = l right = r on 1=1 table2
source = table1 | left semi join left = l right = r on l.a = r.a table2
source = table1 | left anti join left = l right = r on l.a = r.a table2
source = table1 | join left = l right = r [ source = table2 | where d > 10 | head 5 ]
source = table1 | inner join on table1.a = table2.a table2 | fields table1.a, table2.a, table1.b, table1.c
source = table1 | inner join on a = c table2 | fields a, b, c, d
source = table1 as t1 | join left = l right = r on l.a = r.a table2 as t2 | fields l.a, r.a
source = table1 as t1 | join left = l right = r on l.a = r.a table2 as t2 | fields t1.a, t2.a
source = table1 | join left = l right = r on l.a = r.a [ source = table2 ] as s | fields l.a, s.a

Basic syntax parameters

The basic join syntax supports the following parameters.

Parameter	Required/Optional	Description
`<joinCriteria>`	Required	A comparison expression specifying how to join the datasets. Must be placed after the `on` or `where` keyword in the query.
`<right-dataset>`	Required	The right dataset, which can be an index or a subsearch, with or without an alias.
`joinType`	Optional	The type of join to perform. Valid values are `left`, `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`.
`left`	Optional	An alias for the left dataset (typically a subsearch) used to avoid ambiguous field names. Specify as `left = <leftAlias>`.
`right`	Optional	An alias for the right dataset (typically, a subsearch) used to avoid ambiguous field names. Specify as `right = <rightAlias>`.

Extended syntax

join [type=<joinType>] [overwrite=<bool>] [max=n] (<join-field-list> | [left = <leftAlias>] [right = <rightAlias>] (on | where) <joinCriteria>) <right-dataset>

The following are examples of the extended join command syntax:

source = table1 | join type=outer left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | join type=left left = l right = r where l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | join type=inner max=1 left = l right = r where l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | join a table2 | fields a, b, c
source = table1 | join a, b table2 | fields a, b, c
source = table1 | join type=outer a b table2 | fields a, b, c
source = table1 | join type=inner max=1 a, b table2 | fields a, b, c
source = table1 | join type=left overwrite=false max=0 a, b [source=table2 | rename d as b] | fields a, b, c

Extended syntax parameters

The extended join syntax supports the following parameters.

Parameter	Required/Optional	Description
`<joinCriteria>`	Required	A comparison expression specifying how to join the datasets. Must be placed after the `on` or `where` keyword in the query.
`<right-dataset>`	Required	The right dataset, which can be an index or a subsearch, with or without an alias.
`type`	Optional	The join type when using extended syntax. Valid values are `left`, `outer` (same as `left`), `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`.
`<join-field-list>`	Optional	A list of fields used to build the join criteria. These fields must exist in both datasets. If not specified, all fields common to both datasets are used as join keys.
`overwrite`	Optional	Applicable only when `join-field-list` is specified. Specifies whether fields from the right dataset with duplicate names should replace corresponding fields in the main search results. Default is `true`.
`max`	Optional	The maximum number of subsearch results to join with each row in the main search. Default is `0` (unlimited).
`left`	Optional	An alias for the left dataset (typically a subsearch) used to avoid ambiguous field names. Specify as `left = <leftAlias>`.
`right`	Optional	An alias for the right dataset (typically, a subsearch) used to avoid ambiguous field names. Specify as `right = <rightAlias>`.

Configuration

The join command behavior is configured using the plugins.ppl.join.subsearch_maxout setting, which specifies the maximum number of rows from the subsearch to join against. Default is 50000. A value of 0 indicates that the restriction is unlimited.

To update the setting, send the following request:

PUT /_plugins/_query/settings
{
  "persistent": {
    "plugins.ppl.join.subsearch_maxout": "5000"
  }
}

Example 1: Join two indexes

The following query uses the basic join syntax to join two indexes:

source = state_country
| inner join left=a right=b ON a.name = b.name occupation
| stats avg(salary) by span(age, 10) as age_span, b.country

The query returns the following results:

avg(salary)	age_span	b.country
120000.0	40	USA
105000.0	20	Canada
0.0	40	Canada
70000.0	30	USA
100000.0	70	England

Example 2: Join with a subsearch

The following query combines a dataset with a subsearch using the basic join syntax:

source = state_country as a
| where country = 'USA' OR country = 'England'
| left join ON a.name = b.name [ source = occupation
| where salary > 0
| fields name, country, salary
| sort salary
| head 3 ] as b
| stats avg(salary) by span(age, 10) as age_span, b.country

The query returns the following results:

avg(salary)	age_span	b.country
null	40	null
70000.0	30	USA
100000.0	70	England

Example 3: Join using a field list

The following query uses the extended syntax and specifies a list of fields for the join criteria:

source = state_country
| where country = 'USA' OR country = 'England'
| join type=left overwrite=true name [ source = occupation
| where salary > 0
| fields name, country, salary
| sort salary
| head 3 ]
| stats avg(salary) by span(age, 10) as age_span, country

The query returns the following results:

avg(salary)	age_span	country
null	40	null
70000.0	30	USA
100000.0	70	England

Example 4: Join with additional options

The following query uses the extended syntax and optional parameters for more control over the join operation:

source = state_country
| join type=inner overwrite=false max=1 name occupation
| stats avg(salary) by span(age, 10) as age_span, country

The query returns the following results:

avg(salary)	age_span	country
120000.0	40	USA
100000.0	70	USA
105000.0	20	Canada
70000.0	30	USA

Limitations

The join command has the following limitations:

Field name ambiguity in basic syntax – When fields from the left and right datasets share the same name, the field names in the output are ambiguous. To resolve this, conflicting fields are renamed to <alias>.id (or <tableName>.id if no alias is specified).

The following table demonstrates how field name conflicts are resolved when both table1 and table2 contain a field named id.

Query	Output
`source=table1 \\| join left=t1 right=t2 on t1.id=t2.id table2 \\| eval a = 1`	`t1.id, t2.id, a`
`source=table1 \\| join on table1.id=table2.id table2 \\| eval a = 1`	`table1.id, table2.id, a`
`source=table1 \\| join on table1.id=t2.id table2 as t2 \\| eval a = 1`	`table1.id, t2.id, a`
`source=table1 \\| join right=tt on table1.id=t2.id [ source=table2 as t2 \\| eval b = id ] \\| eval a = 1`	`table1.id, tt.id, tt.b, a`

Field deduplication in extended syntax – When using the extended syntax with a field list, duplicate field names in the output are deduplicated according to the overwrite option.
Join type availability – The join types inner, left, outer (alias of left), semi, and anti are enabled by default. The performance-sensitive join types right, full, and cross are disabled by default. To enable these types, set plugins.calcite.all_join_types.allowed to true.

Syntax
- Basic syntax
- Extended syntax
Configuration
Example 1: Join two indexes
Example 2: Join with a subsearch
Example 3: Join using a field list
Example 4: Join with additional options
Limitations

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

join

Syntax

Basic syntax

Basic syntax parameters

Extended syntax

Extended syntax parameters

Configuration

Example 1: Join two indexes

Example 2: Join with a subsearch

Example 3: Join using a field list

Example 4: Join with additional options

Limitations

OpenSearch Links

Get Involved

Resources

Contact Us