dedup
The dedup command removes duplicate documents defined by specified fields from the search result.
Syntax
The dedup command has the following syntax:
dedup [int] <field-list> [keepempty=<bool>] [consecutive=<bool>]
Parameters
The dedup command supports the following parameters.
| Parameter | Required/Optional | Description |
|---|---|---|
<field-list> | Required | A comma-delimited list of fields to use for deduplication. At least one field is required. |
<int> | Optional | The number of duplicate documents to retain for each combination. Must be greater than 0. Default is 1. |
keepempty | Optional | When set to true, keeps documents in which any field in the field list has a NULL value or is missing. Default is false. |
consecutive | Optional | When set to true, removes only consecutive duplicate documents. Default is false. Requires the legacy SQL engine (plugins.calcite.enabled=false). |
Example 1: Removing duplicates based on a single field
The following query deduplicates by service name to get one sample error per service, giving you a quick view of what’s failing across your system:
source=otellogs
| where severityText IN ('ERROR', 'WARN')
| dedup `resource.attributes.service.name`
| sort `resource.attributes.service.name`
| fields `resource.attributes.service.name`, severityText, body
The query returns the following results:
| resource.attributes.service.name | severityText | body |
|---|---|---|
| checkout | ERROR | NullPointerException in CheckoutService.placeOrder at line 142 |
| frontend-proxy | ERROR | [2024-02-01T09:20:00.456Z] “POST /api/checkout HTTP/1.1” 503 - 0 30000 checkout-8d4f7b-mk2p9 |
| payment | ERROR | Payment failed: connection timeout to payment gateway after 30000ms |
| product-catalog | WARN | Slow query detected: SELECT * FROM products WHERE category = ‘electronics’ took 3200ms |
| recommendation | ERROR | Failed to process recommendation request: invalid product ID from 203.0.113.50 |
Example 2: Retaining multiple duplicate documents
The following query keeps up to two logs per severity level, giving you a broader sample of each level to understand the variety of issues:
source=otellogs
| dedup 2 severityText
| sort severityNumber
| fields severityText, severityNumber
| head 6
The query returns the following results:
| severityText | severityNumber |
|---|---|
| DEBUG | 5 |
| DEBUG | 5 |
| INFO | 9 |
| INFO | 9 |
| WARN | 13 |
| WARN | 13 |
Example 3: Handling documents with empty field values
The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped:
source=otellogs
| dedup instrumentationScope.name
| fields instrumentationScope.name
| sort instrumentationScope.name
The query returns the following results:
| instrumentationScope.name |
|---|
| @opentelemetry/instrumentation-http |
| Microsoft.Extensions.Hosting |
| go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc |
The following query deduplicates while ignoring documents with empty values in the specified field:
source=otellogs
| dedup instrumentationScope.name
| fields instrumentationScope.name
| sort instrumentationScope.name
The query returns the following results:
| instrumentationScope.name |
|---|
| @opentelemetry/instrumentation-http |
| Microsoft.Extensions.Hosting |
| go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc |
Example 4: Deduplicating consecutive documents
The following query removes duplicate consecutive documents. When logs are sorted by severity, this shows the transitions between severity levels, helping you see the pattern of escalation:
source=otellogs
| sort severityNumber, `resource.attributes.service.name`
| dedup severityText consecutive=true
| fields severityText, `resource.attributes.service.name`
The query returns the following results:
| severityText | resource.attributes.service.name |
|---|---|
| DEBUG | cart |
| INFO | cart |
| WARN | frontend-proxy |
| ERROR | checkout |