You're viewing version 3.4 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.

dedup

The dedup command removes duplicate documents defined by specified fields from the search result.

Syntax

The dedup command has the following syntax:

dedup [int] <field-list> [keepempty=<bool>] [consecutive=<bool>]

Parameters

The dedup command supports the following parameters.

Parameter	Required/Optional	Description
`<field-list>`	Required	A comma-delimited list of fields to use for deduplication. At least one field is required.
`<int>`	Optional	The number of duplicate documents to retain for each combination. Must be greater than `0`. Default is `1`.
`keepempty`	Optional	When set to `true`, keeps documents in which any field in the field list has a `NULL` value or is missing. Default is `false`.
`consecutive`	Optional	When set to `true`, removes only consecutive duplicate documents. Default is `false`. Requires the legacy SQL engine (`plugins.calcite.enabled=false`).

Example 1: Remove duplicates based on a single field

The following query deduplicates documents based on the gender field:

source=accounts
| dedup gender
| fields account_number, gender
| sort account_number

The query returns the following results:

account_number	gender
1	M
13	F

Example 2: Retain multiple duplicate documents

The following query removes duplicate documents based on the gender field while keeping two duplicate documents:

source=accounts
| dedup 2 gender
| fields account_number, gender
| sort account_number

The query returns the following results:

account_number	gender
1	M
6	M
13	F

Example 3: Handle documents with empty field values

The following query removes duplicate documents while keeping documents with null values in the specified field:

source=accounts
| dedup email keepempty=true
| fields account_number, email
| sort account_number

The query returns the following results:

account_number	email
1	amberduke@pyrami.com
6	hattiebond@netagy.com
13	null
18	daleadams@boink.com

The following query removes duplicate documents while ignoring documents with empty values in the specified field:

source=accounts
| dedup email
| fields account_number, email
| sort account_number

The query returns the following results:

account_number	email
1	amberduke@pyrami.com
6	hattiebond@netagy.com
18	daleadams@boink.com

Example 4: Deduplicate consecutive documents

The following query removes duplicate consecutive documents:

source=accounts
| dedup gender consecutive=true
| fields account_number, gender
| sort account_number

The query returns the following results:

account_number	gender
1	M
13	F
18	M

Syntax
Parameters
Example 1: Remove duplicates based on a single field
Example 2: Retain multiple duplicate documents
Example 3: Handle documents with empty field values
Example 4: Deduplicate consecutive documents

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.