Link Search Menu Expand Document Documentation Menu

Grok processor

The Grok processor uses pattern matching to structure and extract important keys from unstructured data.

Configuration

The following table describes options you can use with the Grok processor to structure your data and make your data easier to query.

Option Required Type Description
break_on_match No Boolean Specifies whether to match all patterns (true) or stop once the first successful match is found (false). Default is true.
grok_when No String Specifies under what condition the grok processor should perform matching. For information about this expression, see Expression syntax. Default is no condition.
keep_empty_captures No Boolean Enables the preservation of null captures from the processed output. Default is false.
keys_to_overwrite No List Specifies which existing keys will be overwritten if there is a capture with the same key value. Default is [].
match No Map Specifies which keys should match specific patterns. Default is an empty response body.
named_captures_only No Boolean Specifies whether to keep only named captures. Default is true.
pattern_definitions No Map Allows for a custom pattern that can be used inline inside the response body. Default is an empty response body.
patterns_directories No List Specifies which directory paths contain the custom pattern files. Default is an empty list.
pattern_files_glob No String Specifies which pattern files to use from the directories specified for pattern_directories. Default is *.
target_key No String Specifies a parent-level key used to store all captures. Default value is null.
timeout_millis No Integer The maximum amount of time during which matching occurs. Setting to 0 prevents any matching from occurring. Default is 30,000.
performance_metadata No Boolean Whether or not to add the performance metadata to events. Default is false. For more information, see Grok performance metadata.

Grok performance metadata

When the performance_metadata option is set to true, the grok processor adds the following metadata keys to each event:

  • _total_grok_processing_time: The total amount of time, in milliseconds, that the grok processor takes to match the event. This is the sum of the processing time based on all of the grok processors that ran on the event and have the performance_metadata option enabled.
  • _total_grok_patterns_attempted: The total number of grok pattern match attempts across all grok processors that ran on the event.

To include Grok performance metadata when the event is sent to the sink inside the pipeline, use the add_entries processor to describe the metadata you want to include, as shown in the following example:

  processor:
    - grok:
        performance_metadata: true
        match:
          log: ["%{COMMONAPACHELOG}"]
        break_on_match: true
        named_captures_only: true
        target_key: "parsed"

    - add_entries:
        entries:
          - add_when: 'getMetadata("_total_grok_patterns_attempted") != null'
            key: "grok_patterns_attempted"
            value_expression: 'getMetadata("_total_grok_patterns_attempted")'
          - add_when: 'getMetadata("_total_grok_processing_time") != null'
            key: "grok_time_spent"
            value_expression: 'getMetadata("_total_grok_processing_time")'

Example

The following examples demonstrate different ways in which the grok processor can be configured.

The examples don’t use security and are for demonstration purposes only. We strongly recommend configuring SSL before using these examples in production.

Parse Apache access logs

This example demonstrates parsing standard Apache HTTP access logs to extract client IP, timestamp, HTTP method, URL, status code, and response size:

apache-access-logs-pipeline:
  source:
    http:
      path: /logs
      ssl: false

  processor:
    - grok:
        match:
          message: ['%{COMBINEDAPACHELOG}']
        break_on_match: true
        named_captures_only: true
        keep_empty_captures: false
        target_key: "parsed"

    - date:
        match:
          - key: "/parsed/timestamp"     # JSON Pointer ✔
            patterns: ["dd/MMM/yyyy:HH:mm:ss Z"]
        destination: "@timestamp"
        source_timezone: "UTC"

  sink:
    - opensearch:
        hosts: ["https://opensearch:9200"]
        insecure: true
        username: admin
        password: "admin_pass"
        index_type: custom
        index: "apache-logs-%{yyyy.MM.dd}"

You can test this pipeline using the following command:

curl -sS -X POST "http://localhost:2021/logs" \
  -H "Content-Type: application/json" \
  -d '[
    {"message":"127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] \"GET /apache.gif HTTP/1.0\" 200 2326 \"http://www.example.com/start.html\" \"Mozilla/4.08 [en] (Win98; I ;Nav)\""},
    {"message":"192.168.1.5 - - [13/Oct/2025:17:42:10 +0000] \"POST /login HTTP/1.1\" 302 512 \"-\" \"curl/8.5.0\""}
  ]'

The documents stored in OpenSearch contain the following information:

{
  ...
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "apache-logs-2025.10.13",
        "_id": "gLO73pkBpMIC6s6zUMMX",
        "_score": 1,
        "_source": {
          "message": "127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] \"GET /apache.gif HTTP/1.0\" 200 2326 \"http://www.example.com/start.html\" \"Mozilla/4.08 [en] (Win98; I ;Nav)\"",
          "parsed": {
            "request": "/apache.gif",
            "referrer": "http://www.example.com/start.html",
            "agent": "Mozilla/4.08 [en] (Win98; I ;Nav)",
            "auth": "frank",
            "ident": "-",
            "response": "200",
            "bytes": "2326",
            "clientip": "127.0.0.1",
            "verb": "GET",
            "httpversion": "1.0",
            "timestamp": "10/Oct/2000:13:55:36 -0700"
          },
          "@timestamp": "2000-10-10T20:55:36.000Z"
        }
      },
      {
        "_index": "apache-logs-2025.10.13",
        "_id": "gbO73pkBpMIC6s6zUMMX",
        "_score": 1,
        "_source": {
          "message": "192.168.1.5 - - [13/Oct/2025:17:42:10 +0000] \"POST /login HTTP/1.1\" 302 512 \"-\" \"curl/8.5.0\"",
          "parsed": {
            "request": "/login",
            "referrer": "-",
            "agent": "curl/8.5.0",
            "auth": "-",
            "ident": "-",
            "response": "302",
            "bytes": "512",
            "clientip": "192.168.1.5",
            "verb": "POST",
            "httpversion": "1.1",
            "timestamp": "13/Oct/2025:17:42:10 +0000"
          },
          "@timestamp": "2025-10-13T17:42:10.000Z"
        }
      }
    ]
  }
}

Parse application logs with custom patterns

This example demonstrates parsing custom application logs with user-defined patterns for extracting structured data from proprietary log formats:

application-logs-pipeline:
  source:
    http:
      path: /logs
      ssl: false

  processor:
    - grok:
        match:
          message: ['%{TIMESTAMP_ISO8601:timestamp} \[%{LOGLEVEL:level}\] %{DATA:component} - %{GREEDYDATA:details}']
        pattern_definitions:
          LOGLEVEL: (?:INFO|WARN|ERROR|DEBUG|TRACE)
        break_on_match: true
        target_key: "parsed"
        keep_empty_captures: false

    - date:
        match:
          - key: "/parsed/timestamp"
            patterns:
              - "yyyy-MM-dd HH:mm:ss"
              - "yyyy-MM-dd HH:mm:ss.SSS"
        destination: "@timestamp"     # you could also use "/@timestamp"
        source_timezone: "UTC"

  sink:
    - opensearch:
        hosts: ["https://opensearch:9200"]
        insecure: true
        username: admin
        password: "admin_pass"
        index_type: custom
        index: "application-logs-%{yyyy.MM.dd}"

You can test this pipeline using the following command:

curl -sS -X POST "http://localhost:2021/logs" \
  -H "Content-Type: application/json" \
  -d '[
    {"message": "2025-10-13 14:30:45 [INFO] UserService - User login successful"},
    {"message": "2025-10-13 14:31:15 [ERROR] DatabaseConnection - Connection timeout"},
    {"message": "2025-10-13 14:32:30 [DEBUG] CacheManager - Cache hit"},
    {"message": "2025-10-13 14:33:05 [WARN] MetricsCollector - High memory usage detected"}
  ]'

The documents stored in OpenSearch contain the following information:

{
  ...
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "application-logs-2025.10.13",
        "_id": "i7O83pkBpMIC6s6zhsNK",
        "_score": 1,
        "_source": {
          "message": "2025-10-13 14:30:45 [INFO] UserService - User login successful",
          "parsed": {
            "component": "UserService",
            "level": "INFO",
            "details": "User login successful",
            "timestamp": "2025-10-13 14:30:45"
          },
          "@timestamp": "2025-10-13T14:30:45.000Z"
        }
      },
      {
        "_index": "application-logs-2025.10.13",
        "_id": "jLO83pkBpMIC6s6zhsNK",
        "_score": 1,
        "_source": {
          "message": "2025-10-13 14:31:15 [ERROR] DatabaseConnection - Connection timeout",
          "parsed": {
            "component": "DatabaseConnection",
            "level": "ERROR",
            "details": "Connection timeout",
            "timestamp": "2025-10-13 14:31:15"
          },
          "@timestamp": "2025-10-13T14:31:15.000Z"
        }
      },
      {
        "_index": "application-logs-2025.10.13",
        "_id": "jbO83pkBpMIC6s6zhsNK",
        "_score": 1,
        "_source": {
          "message": "2025-10-13 14:32:30 [DEBUG] CacheManager - Cache hit",
          "parsed": {
            "component": "CacheManager",
            "level": "DEBUG",
            "details": "Cache hit",
            "timestamp": "2025-10-13 14:32:30"
          },
          "@timestamp": "2025-10-13T14:32:30.000Z"
        }
      },
      {
        "_index": "application-logs-2025.10.13",
        "_id": "jrO83pkBpMIC6s6zhsNK",
        "_score": 1,
        "_source": {
          "message": "2025-10-13 14:33:05 [WARN] MetricsCollector - High memory usage detected",
          "parsed": {
            "component": "MetricsCollector",
            "level": "WARN",
            "details": "High memory usage detected",
            "timestamp": "2025-10-13 14:33:05"
          },
          "@timestamp": "2025-10-13T14:33:05.000Z"
        }
      }
    ]
  }
}

Parse network device logs with multiple patterns

This example demonstrates using multiple grok patterns to handle different log formats from network devices, with conditional processing based on log type:

network-device-logs-pipeline:
  source:
    http:
      path: /logs
      ssl: false

  processor:
    - grok:
        match:
          message: [
            # syslog-like
            '%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:host} %{DATA:program}(?:\[%{POSINT:pid}\])?: %{GREEDYDATA:message}',
            # ISO8601 + IP
            '%{TIMESTAMP_ISO8601:timestamp} %{IP:host} %{DATA:program}: %{GREEDYDATA:message}',
            # Cisco style
            '%{CISCO_TIMESTAMP:timestamp}: %{DATA:facility}-%{INT:severity}-%{DATA:mnemonic}: %{GREEDYDATA:message}'
          ]
        break_on_match: true
        named_captures_only: true
        pattern_definitions:
          CISCO_TIMESTAMP: '%{MONTH} %{MONTHDAY} %{TIME}'
        target_key: "parsed"
        timeout_millis: 5000

    # Extract login info from the parsed message text
    - grok:
        match:
          /parsed/message: ['User %{USERNAME:user} logged in from %{IP:source_ip}']
        break_on_match: true
        target_key: "login_info"

    - date:
        match:
          - key: "/parsed/timestamp"
            patterns:
              # syslog-like (no year)
              - "MMM d HH:mm:ss"
              # ISO8601 without/with millis & TZ
              - "yyyy-MM-dd'T'HH:mm:ssXXX"
              - "yyyy-MM-dd'T'HH:mm:ss.SSSXXX"
        destination: "@timestamp"
        source_timezone: "UTC"

  sink:
    - opensearch:
        hosts: ["https://opensearch:9200"]
        insecure: true
        username: admin
        password: "admin_pass"
        index_type: custom
        index: "network-logs-%{yyyy.MM.dd}"

You can test this pipeline using the following command:

curl -sS -X POST "http://localhost:2021/logs" \
  -H "Content-Type: application/json" \
  -d '[
    {"message":"Oct 13 14:30:45 router1 sshd[1234]: User alice logged in from 10.0.0.5"},
    {"message":"2025-10-13T16:01:22Z 192.168.0.10 dhcpd: Lease granted to 192.168.0.55"},
    {"message":"Oct 13 16:30:45: LOCAL4-3-LINK_UPDOWN: Interface Gi0/1 changed state to up"}
  ]'

The documents stored in OpenSearch contain the following information:

{
  ...
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "network-logs-2025.10.13",
        "_id": "-kzC3pkBl88jNjkRQ1TJ",
        "_score": 1,
        "_source": {
          "message": "Oct 13 14:30:45 router1 sshd[1234]: User alice logged in from 10.0.0.5",
          "parsed": {
            "host": "router1",
            "pid": "1234",
            "program": "sshd",
            "message": "User alice logged in from 10.0.0.5",
            "timestamp": "Oct 13 14:30:45"
          },
          "login_info": {
            "user": "alice",
            "source_ip": "10.0.0.5"
          },
          "@timestamp": "2025-10-13T14:30:45.000Z"
        }
      },
      {
        "_index": "network-logs-2025.10.13",
        "_id": "-0zC3pkBl88jNjkRQ1TJ",
        "_score": 1,
        "_source": {
          "message": "2025-10-13T16:01:22Z 192.168.0.10 dhcpd: Lease granted to 192.168.0.55",
          "parsed": {
            "host": "192.168.0.10",
            "program": "dhcpd",
            "message": "Lease granted to 192.168.0.55",
            "timestamp": "2025-10-13T16:01:22Z"
          },
          "login_info": {},
          "@timestamp": "2025-10-13T16:01:22.000Z"
        }
      },
      {
        "_index": "network-logs-2025.10.13",
        "_id": "_EzC3pkBl88jNjkRQ1TJ",
        "_score": 1,
        "_source": {
          "message": "Oct 13 16:30:45: LOCAL4-3-LINK_UPDOWN: Interface Gi0/1 changed state to up",
          "parsed": {
            "severity": "3",
            "mnemonic": "LINK_UPDOWN",
            "message": "Interface Gi0/1 changed state to up",
            "facility": "LOCAL4",
            "timestamp": "Oct 13 16:30:45"
          },
          "login_info": {},
          "@timestamp": "2025-10-13T16:30:45.000Z"
        }
      }
    ]
  }
}

Metrics

The following table describes common Abstract processor metrics.

Metric name Type Description
recordsIn Counter Metric representing the ingress of records to a pipeline component.
recordsOut Counter Metric representing the egress of records from a pipeline component.
timeElapsed Timer Metric representing the time elapsed during execution of a pipeline component.

The Grok processor includes the following custom metrics.

Counter

  • grokProcessingMismatch: Records the number of records that did not match any of the patterns specified in the match field.
  • grokProcessingMatch: Records the number of records that matched at least one pattern from the match field.
  • grokProcessingErrors: Records the total number of record processing errors.
  • grokProcessingTimeouts: Records the total number of records that timed out while matching.

Timer

  • grokProcessingTime: The time taken by individual records to match against match patterns. The avg metric is the most useful metric for this timer because because it provides the average time taken to match records.