Link Search Menu Expand Document Documentation Menu

Decompress processor

The decompress processor decompresses any Base64-encoded compressed fields inside of an event.

Configuration

Option Required Type Description
keys Yes List The fields in the event that will be decompressed.
type Yes Enum The type of decompression to use for the keys in the event. Only gzip is supported.
decompress_when No String A Data Prepper conditional expression that determines when the decompress processor will run on certain events.
tags_on_failure No List A list of strings with which to tag events when the processor fails to decompress the keys inside an event. Defaults to _decompression_failure.

Usage

This example demonstrates a complete pipeline that receives compressed log data, decompresses it, and stores it in OpenSearch:

decompress-logs-pipeline:
  source:
    http:
      path: /events
      ssl: false

  processor:
    - decompress:
        keys: ["compressed_log", "compressed_metadata"]
        type: gzip
        tags_on_failure: ["decompression_failed", "gzip_error"]

    # Only parse if decompression succeeded
    - parse_json:
        source: compressed_log
        destination: parsed_log
        parse_when: not hasTags("decompression_failed")

  sink:
    - opensearch:
        hosts: ["https://opensearch:9200"]
        insecure: true
        username: admin
        password: admin_passwrd
        index_type: custom
        index: decompressed-logs-%{yyyy.MM.dd}
        tags_target_key: "tags"

You can test the pipeline using the two following commands:


GOOD_COMPRESSED_LOG=$(echo '{"message":"Application OK","level":"INFO","service":"api"}' | gzip | base64 | tr -d '\n')
GOOD_COMPRESSED_METADATA=$(echo '{"source_ip":"192.168.1.100","ok":true}' | gzip | base64 | tr -d '\n')

curl -sS -X POST "http://localhost:2021/events" \
  -H "Content-Type: application/json" \
  -d "[
    {
      \"compressed_log\": \"${GOOD_COMPRESSED_LOG}\",
      \"compressed_metadata\": \"${GOOD_COMPRESSED_METADATA}\",
      \"host\": \"web-server-01\",
      \"timestamp\": \"2023-10-13T14:30:45Z\"
    }
  ]"

BAD_COMPRESSED_LOG=$(echo -n 'this is not gzipped' | base64 | tr -d '\n')
GOOD_COMPRESSED_METADATA=$(echo '{"source_ip":"192.168.1.100","ok":true}' | gzip | base64 | tr -d '\n')

curl -sS -X POST "http://localhost:2021/events" \
  -H "Content-Type: application/json" \
  -d "[
    {
      \"compressed_log\": \"${BAD_COMPRESSED_LOG}\",
      \"compressed_metadata\": \"${GOOD_COMPRESSED_METADATA}\",
      \"host\": \"web-server-02\",
      \"timestamp\": \"2023-10-13T14:31:45Z\"
    }
  ]"

The documents stored in OpenSearch contain the following information:

{
  ...
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "decompressed-logs-2025.11.04",
        "_id": "dOv1TpoB5Jwkiy-iuKKs",
        "_score": 1,
        "_source": {
          "compressed_log": """{"message":"Application OK","level":"INFO","service":"api"}
""",
          "compressed_metadata": """{"source_ip":"192.168.1.100","ok":true}
""",
          "host": "web-server-01",
          "timestamp": "2023-10-13T14:30:45Z",
          "parsed_log": {
            "level": "INFO",
            "service": "api",
            "message": "Application OK"
          },
          "tags": []
        }
      },
      {
        "_index": "decompressed-logs-2025.11.04",
        "_id": "dev1TpoB5Jwkiy-iuKKs",
        "_score": 1,
        "_source": {
          "compressed_log": "dGhpcyBpcyBub3QgZ3ppcHBlZA==",
          "compressed_metadata": """{"source_ip":"192.168.1.100","ok":true}
""",
          "host": "web-server-02",
          "timestamp": "2023-10-13T14:31:45Z",
          "tags": [
            "decompression_failed",
            "gzip_error"
          ]
        }
      }
    ]
  }
}

Metrics

By default, Data Prepper serves metrics from the /metrics/prometheus endpoint on port 4900. You can access all the metrics by running the following command:

curl http://localhost:4900/metrics/prometheus

The following table describes common abstract processor metrics.

Metric name Type Description
recordsIn Counter The ingress of records to a pipeline component.
recordsOut Counter The egress of records from a pipeline component.
timeElapsed Timer The time elapsed during execution of a pipeline component.

Counter

The decompress processor accounts for the following metrics:

  • processingErrors: The number of processing errors that have occurred in the decompress processor.
350 characters left

Have a question? .

Want to contribute? or .