You're viewing version 3.4 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.

join

The join command combines two datasets. The left side can be an index or the results of piped commands, while the right side can be either an index or a subsearch.

Syntax

The join command supports basic and extended syntax options.

Basic syntax

[joinType] join [left = <leftAlias>] [right = <rightAlias>] (on | where) <joinCriteria> <right-dataset>

When using aliases, left must appear before right.

The following are examples of the basic join command syntax:

source = table1 | inner join left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | inner join left = l right = r where l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | left join left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | right join left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | full left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | cross join left = l right = r on 1=1 table2
source = table1 | left semi join left = l right = r on l.a = r.a table2
source = table1 | left anti join left = l right = r on l.a = r.a table2
source = table1 | join left = l right = r [ source = table2 | where d > 10 | head 5 ]
source = table1 | inner join on table1.a = table2.a table2 | fields table1.a, table2.a, table1.b, table1.c
source = table1 | inner join on a = c table2 | fields a, b, c, d
source = table1 as t1 | join left = l right = r on l.a = r.a table2 as t2 | fields l.a, r.a
source = table1 as t1 | join left = l right = r on l.a = r.a table2 as t2 | fields t1.a, t2.a
source = table1 | join left = l right = r on l.a = r.a [ source = table2 ] as s | fields l.a, s.a

Basic syntax parameters

The basic join syntax supports the following parameters.

Parameter	Required/Optional	Description
`<joinCriteria>`	Required	A comparison expression specifying how to join the datasets. Must be placed after the `on` or `where` keyword in the query.
`<right-dataset>`	Required	The right dataset, which can be an index or a subsearch, with or without an alias.
`joinType`	Optional	The type of join to perform. Valid values are `left`, `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`.
`left`	Optional	An alias for the left dataset (typically a subsearch) used to avoid ambiguous field names. Specify as `left = <leftAlias>`.
`right`	Optional	An alias for the right dataset (typically, a subsearch) used to avoid ambiguous field names. Specify as `right = <rightAlias>`.

Extended syntax

join [type=<joinType>] [overwrite=<bool>] [max=n] (<join-field-list> | [left = <leftAlias>] [right = <rightAlias>] (on | where) <joinCriteria>) <right-dataset>

The following are examples of the extended join command syntax:

source = table1 | join type=outer left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | join type=left left = l right = r where l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | join type=inner max=1 left = l right = r where l.a = r.a table2 | fields l.a, r.a, b, c
source = table1 | join a table2 | fields a, b, c
source = table1 | join a, b table2 | fields a, b, c
source = table1 | join type=outer a b table2 | fields a, b, c
source = table1 | join type=inner max=1 a, b table2 | fields a, b, c
source = table1 | join type=left overwrite=false max=0 a, b [source=table2 | rename d as b] | fields a, b, c

Extended syntax parameters

The extended join syntax supports the following parameters.

Parameter	Required/Optional	Description
`<joinCriteria>`	Required	A comparison expression specifying how to join the datasets. Must be placed after the `on` or `where` keyword in the query.
`<right-dataset>`	Required	The right dataset, which can be an index or a subsearch, with or without an alias.
`type`	Optional	The join type when using extended syntax. Valid values are `left`, `outer` (same as `left`), `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`.
`<join-field-list>`	Optional	A list of fields used to build the join criteria. These fields must exist in both datasets. If not specified, all fields common to both datasets are used as join keys.
`overwrite`	Optional	Applicable only when `join-field-list` is specified. Specifies whether fields from the right dataset with duplicate names should replace corresponding fields in the main search results. Default is `true`.
`max`	Optional	The maximum number of subsearch results to join with each row in the main search. Default is `0` (unlimited).
`left`	Optional	An alias for the left dataset (typically a subsearch) used to avoid ambiguous field names. Specify as `left = <leftAlias>`.
`right`	Optional	An alias for the right dataset (typically, a subsearch) used to avoid ambiguous field names. Specify as `right = <rightAlias>`.

Configuration

The join command behavior is configured using the plugins.ppl.join.subsearch_maxout setting, which specifies the maximum number of rows from the subsearch to join against. Default is 50000. A value of 0 indicates that the restriction is unlimited.

To update the setting, send the following request:

PUT /_plugins/_query/settings
{
  "persistent": {
    "plugins.ppl.join.subsearch_maxout": "5000"
  }
}

Example 1: Join two indexes

The following query uses the basic join syntax to join two indexes:

source = state_country
| inner join left=a right=b ON a.name = b.name occupation
| stats avg(salary) by span(age, 10) as age_span, b.country

The query returns the following results:

avg(salary)	age_span	b.country
120000.0	40	USA
105000.0	20	Canada
0.0	40	Canada
70000.0	30	USA
100000.0	70	England

Example 2: Join with a subsearch

The following query combines a dataset with a subsearch using the basic join syntax:

source = state_country as a
| where country = 'USA' OR country = 'England'
| left join ON a.name = b.name [ source = occupation
| where salary > 0
| fields name, country, salary
| sort salary
| head 3 ] as b
| stats avg(salary) by span(age, 10) as age_span, b.country

The query returns the following results:

avg(salary)	age_span	b.country
null	40	null
70000.0	30	USA
100000.0	70	England

Example 3: Join using a field list

The following query uses the extended syntax and specifies a list of fields for the join criteria:

source = state_country
| where country = 'USA' OR country = 'England'
| join type=left overwrite=true name [ source = occupation
| where salary > 0
| fields name, country, salary
| sort salary
| head 3 ]
| stats avg(salary) by span(age, 10) as age_span, country

The query returns the following results:

avg(salary)	age_span	country
null	40	null
70000.0	30	USA
100000.0	70	England

Example 4: Join with additional options

The following query uses the extended syntax and optional parameters for more control over the join operation:

source = state_country
| join type=inner overwrite=false max=1 name occupation
| stats avg(salary) by span(age, 10) as age_span, country

The query returns the following results:

avg(salary)	age_span	country
120000.0	40	USA
100000.0	70	USA
105000.0	20	Canada
70000.0	30	USA

Limitations

The join command has the following limitations:

Field name ambiguity in basic syntax – When fields from the left and right datasets share the same name, the field names in the output are ambiguous. To resolve this, conflicting fields are renamed to <alias>.id (or <tableName>.id if no alias is specified).

The following table demonstrates how field name conflicts are resolved when both table1 and table2 contain a field named id.

Query	Output
`source=table1 \\| join left=t1 right=t2 on t1.id=t2.id table2 \\| eval a = 1`	`t1.id, t2.id, a`
`source=table1 \\| join on table1.id=table2.id table2 \\| eval a = 1`	`table1.id, table2.id, a`
`source=table1 \\| join on table1.id=t2.id table2 as t2 \\| eval a = 1`	`table1.id, t2.id, a`
`source=table1 \\| join right=tt on table1.id=t2.id [ source=table2 as t2 \\| eval b = id ] \\| eval a = 1`	`table1.id, tt.id, tt.b, a`

Field deduplication in extended syntax – When using the extended syntax with a field list, duplicate field names in the output are deduplicated according to the overwrite option.
Join type availability – The join types inner, left, outer (alias of left), semi, and anti are enabled by default. The performance-sensitive join types right, full, and cross are disabled by default. To enable these types, set plugins.calcite.all_join_types.allowed to true.

Syntax
- Basic syntax
- Extended syntax
Configuration
Example 1: Join two indexes
Example 2: Join with a subsearch
Example 3: Join using a field list
Example 4: Join with additional options
Limitations

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

join

Syntax

Basic syntax

Basic syntax parameters

Extended syntax

Extended syntax parameters

Configuration

Example 1: Join two indexes

Example 2: Join with a subsearch

Example 3: Join using a field list

Example 4: Join with additional options

Limitations

OpenSearch Links

Get Involved

Resources

Contact Us