Link Search Menu Expand Document Documentation Menu

You're viewing version 3.4 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.

subquery

The subquery command allows you to embed one PPL query within another, enabling advanced filtering and data retrieval. A subquery is executed first, and its results are used by the outer query for filtering, comparison, or joining.

Common use cases for subqueries include:

  • Filtering data based on the results of another query.
  • Checking for the existence of related data.
  • Performing calculations that rely on aggregated values from other tables.
  • Creating complex joins with dynamic conditions.

Syntax

The subquery command has the following syntax:

subquery: [ source=... | ... | ... ]

Subqueries use the same syntax as regular PPL queries but must be enclosed in square brackets. There are four main subquery types:

IN subquery

Tests whether a field value exists in the results of a subquery:

where <field> [not] in [ source=... | ... | ... ]

The following are examples of the IN subquery syntax:

source = outer | where a in [ source = inner | fields b ]
source = outer | where (a) in [ source = inner | fields b ]
source = outer | where (a,b,c) in [ source = inner | fields d,e,f ]
source = outer | where a not in [ source = inner | fields b ]
source = outer | where (a) not in [ source = inner | fields b ]
source = outer | where (a,b,c) not in [ source = inner | fields d,e,f ]
source = outer a in [ source = inner | fields b ] // search filtering with subquery
source = outer a not in [ source = inner | fields b ] // search filtering with subquery
source = outer | where a in [ source = inner1 | where b not in [ source = inner2 | fields c ] | fields b ] // nested
source = table1 | inner join left = l right = r on l.a = r.a AND r.a in [ source = inner | fields d ] | fields l.a, r.a, b, c //as join filter

EXISTS subquery

Tests whether a subquery returns any results:

where [not] exists [ source=... | ... | ... ]

The following are examples of the EXISTS subquery syntax:

// Assumptions: `a`, `b` are fields of table outer, `c`, `d` are fields of table inner,  `e`, `f` are fields of table nested
source = outer | where exists [ source = inner | where a = c ]
source = outer | where not exists [ source = inner | where a = c ]
source = outer | where exists [ source = inner | where a = c and b = d ]
source = outer | where not exists [ source = inner | where a = c and b = d ]
source = outer exists [ source = inner | where a = c ] // search filtering with subquery
source = outer not exists [ source = inner | where a = c ] // search filtering with subquery
source = table as t1 exists [ source = table as t2 | where t1.a = t2.a ] //table alias is useful in exists subquery
source = outer | where exists [ source = inner1 | where a = c and exists [ source = nested | where c = e ] ] //nested
source = outer | where exists [ source = inner1 | where a = c | where exists [ source = nested | where c = e ] ] //nested
source = outer | where exists [ source = inner | where c > 10 ] //uncorrelated exists
source = outer | where not exists [ source = inner | where c > 10 ] //uncorrelated exists
source = outer | where exists [ source = inner ] | eval l = "nonEmpty" | fields l //special uncorrelated exists

Scalar subquery

Returns a single value that can be used in comparisons or calculations:

where <field> = [ source=... | ... | ... ]

The following are examples of the scalar subquery syntax:

//Uncorrelated scalar subquery in Select
source = outer | eval m = [ source = inner | stats max(c) ] | fields m, a
source = outer | eval m = [ source = inner | stats max(c) ] + b | fields m, a
//Uncorrelated scalar subquery in Where**
source = outer | where a > [ source = inner | stats min(c) ] | fields a
//Uncorrelated scalar subquery in Search filter
source = outer a > [ source = inner | stats min(c) ] | fields a
//Correlated scalar subquery in Select
source = outer | eval m = [ source = inner | where outer.b = inner.d | stats max(c) ] | fields m, a
source = outer | eval m = [ source = inner | where b = d | stats max(c) ] | fields m, a
source = outer | eval m = [ source = inner | where outer.b > inner.d | stats max(c) ] | fields m, a
//Correlated scalar subquery in Where
source = outer | where a = [ source = inner | where outer.b = inner.d | stats max(c) ]
source = outer | where a = [ source = inner | where b = d | stats max(c) ]
source = outer | where [ source = inner | where outer.b = inner.d OR inner.d = 1 | stats count() ] > 0 | fields a
//Correlated scalar subquery in Search filter
source = outer a = [ source = inner | where b = d | stats max(c) ]
source = outer [ source = inner | where outer.b = inner.d OR inner.d = 1 | stats count() ] > 0 | fields a
//Nested scalar subquery
source = outer | where a = [ source = inner | stats max(c) | sort c ] OR b = [ source = inner | where c = 1 | stats min(d) | sort d ]
source = outer | where a = [ source = inner | where c =  [ source = nested | stats max(e) by f | sort f ] | stats max(d) by c | sort c | head 1 ]

Relation subquery

Used in join operations to provide dynamic right-side data:

| join ON condition [ source=... | ... | ... ]

The following are examples of the relation subquery syntax:

source = table1 | join left = l right = r on condition [ source = table2 | where d > 10 | head 5 ] //subquery in join right side
source = [ source = table1 | join left = l right = r [ source = table2 | where d > 10 | head 5 ] | stats count(a) by b ] as outer | head 1

Configuration

The subquery command behavior is configured using the plugins.ppl.subsearch.maxout setting, which specifies the maximum number of rows to return from the subsearch. Default is 10000. A value of 0 indicates that the restriction is unlimited.

To update the setting, send the following request:

PUT /_plugins/_query/settings
{
  "persistent": {
    "plugins.ppl.subsearch.maxout": "0"
  }
}

Example 1: TPC-H q20

The following query demonstrates a complex TPC-H query 20 implementation using nested subqueries:

source = supplier
| join ON s_nationkey = n_nationkey nation
| where n_name = 'CANADA'
  and s_suppkey in [
    source = partsupp
    | where ps_partkey in [
        source = part
        | where like(p_name, 'forest%')
        | fields p_partkey
      ]
      and ps_availqty > [
        source = lineitem
        | where l_partkey = ps_partkey
          and l_suppkey = ps_suppkey
          and l_shipdate >= date('1994-01-01')
          and l_shipdate < date_add(date('1994-01-01'), interval 1 year)
        | stats sum(l_quantity) as sum_l_quantity
        | eval half_sum_l_quantity = 0.5 * sum_l_quantity // Stats and Eval commands can combine when issues/819 resolved
        | fields half_sum_l_quantity
      ]
    | fields ps_suppkey
  ]

Example 2: TPC-H q22

The following query demonstrates a TPC-H query 22 implementation using EXISTS and scalar subqueries:

source = [
  source = customer
    | where substring(c_phone, 1, 2) in ('13', '31', '23', '29', '30', '18', '17')
      and c_acctbal > [
          source = customer
          | where c_acctbal > 0.00
            and substring(c_phone, 1, 2) in ('13', '31', '23', '29', '30', '18', '17')
          | stats avg(c_acctbal)
        ]
      and not exists [
          source = orders
          | where o_custkey = c_custkey
        ]
    | eval cntrycode = substring(c_phone, 1, 2)
    | fields cntrycode, c_acctbal
  ] as custsale
| stats count() as numcust, sum(c_acctbal) as totacctbal by cntrycode
| sort cntrycode

350 characters left

Have a question? .

Want to contribute? or .