You're viewing version 3.4 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.
dedup
The dedup command removes duplicate documents defined by specified fields from the search result.
Syntax
The dedup command has the following syntax:
dedup [int] <field-list> [keepempty=<bool>] [consecutive=<bool>]
Parameters
The dedup command supports the following parameters.
| Parameter | Required/Optional | Description |
|---|---|---|
<field-list> | Required | A comma-delimited list of fields to use for deduplication. At least one field is required. |
<int> | Optional | The number of duplicate documents to retain for each combination. Must be greater than 0. Default is 1. |
keepempty | Optional | When set to true, keeps documents in which any field in the field list has a NULL value or is missing. Default is false. |
consecutive | Optional | When set to true, removes only consecutive duplicate documents. Default is false. Requires the legacy SQL engine (plugins.calcite.enabled=false). |
Example 1: Remove duplicates based on a single field
The following query deduplicates documents based on the gender field:
source=accounts
| dedup gender
| fields account_number, gender
| sort account_number
The query returns the following results:
| account_number | gender |
|---|---|
| 1 | M |
| 13 | F |
Example 2: Retain multiple duplicate documents
The following query removes duplicate documents based on the gender field while keeping two duplicate documents:
source=accounts
| dedup 2 gender
| fields account_number, gender
| sort account_number
The query returns the following results:
| account_number | gender |
|---|---|
| 1 | M |
| 6 | M |
| 13 | F |
Example 3: Handle documents with empty field values
The following query removes duplicate documents while keeping documents with null values in the specified field:
source=accounts
| dedup email keepempty=true
| fields account_number, email
| sort account_number
The query returns the following results:
| account_number | |
|---|---|
| 1 | amberduke@pyrami.com |
| 6 | hattiebond@netagy.com |
| 13 | null |
| 18 | daleadams@boink.com |
The following query removes duplicate documents while ignoring documents with empty values in the specified field:
source=accounts
| dedup email
| fields account_number, email
| sort account_number
The query returns the following results:
| account_number | |
|---|---|
| 1 | amberduke@pyrami.com |
| 6 | hattiebond@netagy.com |
| 18 | daleadams@boink.com |
Example 4: Deduplicate consecutive documents
The following query removes duplicate consecutive documents:
source=accounts
| dedup gender consecutive=true
| fields account_number, gender
| sort account_number
The query returns the following results:
| account_number | gender |
|---|---|
| 1 | M |
| 13 | F |
| 18 | M |