Grok processor
The Grok processor uses pattern matching to structure and extract important keys from unstructured data.
Configuration
The following table describes options you can use with the Grok processor to structure your data and make your data easier to query.
| Option | Required | Type | Description |
|---|---|---|---|
break_on_match | No | Boolean | Specifies whether to match all patterns (true) or stop once the first successful match is found (false). Default is true. |
grok_when | No | String | Specifies under what condition the grok processor should perform matching. For information about this expression, see Expression syntax. Default is no condition. |
keep_empty_captures | No | Boolean | Enables the preservation of null captures from the processed output. Default is false. |
keys_to_overwrite | No | List | Specifies which existing keys will be overwritten if there is a capture with the same key value. Default is []. |
match | No | Map | Specifies which keys should match specific patterns. Default is an empty response body. |
named_captures_only | No | Boolean | Specifies whether to keep only named captures. Default is true. |
pattern_definitions | No | Map | Allows for a custom pattern that can be used inline inside the response body. Default is an empty response body. |
patterns_directories | No | List | Specifies which directory paths contain the custom pattern files. Default is an empty list. |
pattern_files_glob | No | String | Specifies which pattern files to use from the directories specified for pattern_directories. Default is *. |
target_key | No | String | Specifies a parent-level key used to store all captures. Default value is null. |
timeout_millis | No | Integer | The maximum amount of time during which matching occurs. Setting to 0 prevents any matching from occurring. Default is 30,000. |
performance_metadata | No | Boolean | Whether or not to add the performance metadata to events. Default is false. For more information, see Grok performance metadata. |
Grok performance metadata
When the performance_metadata option is set to true, the grok processor adds the following metadata keys to each event:
_total_grok_processing_time: The total amount of time, in milliseconds, that thegrokprocessor takes to match the event. This is the sum of the processing time based on all of thegrokprocessors that ran on the event and have theperformance_metadataoption enabled._total_grok_patterns_attempted: The total number ofgrokpattern match attempts across allgrokprocessors that ran on the event.
To include Grok performance metadata when the event is sent to the sink inside the pipeline, use the add_entries processor to describe the metadata you want to include, as shown in the following example:
processor:
- grok:
performance_metadata: true
match:
log: ["%{COMMONAPACHELOG}"]
break_on_match: true
named_captures_only: true
target_key: "parsed"
- add_entries:
entries:
- add_when: 'getMetadata("_total_grok_patterns_attempted") != null'
key: "grok_patterns_attempted"
value_expression: 'getMetadata("_total_grok_patterns_attempted")'
- add_when: 'getMetadata("_total_grok_processing_time") != null'
key: "grok_time_spent"
value_expression: 'getMetadata("_total_grok_processing_time")'
Example
The following examples demonstrate different ways in which the grok processor can be configured.
The examples don’t use security and are for demonstration purposes only. We strongly recommend configuring SSL before using these examples in production.
Parse Apache access logs
This example demonstrates parsing standard Apache HTTP access logs to extract client IP, timestamp, HTTP method, URL, status code, and response size:
apache-access-logs-pipeline:
source:
http:
path: /logs
ssl: false
processor:
- grok:
match:
message: ['%{COMBINEDAPACHELOG}']
break_on_match: true
named_captures_only: true
keep_empty_captures: false
target_key: "parsed"
- date:
match:
- key: "/parsed/timestamp" # JSON Pointer ✔
patterns: ["dd/MMM/yyyy:HH:mm:ss Z"]
destination: "@timestamp"
source_timezone: "UTC"
sink:
- opensearch:
hosts: ["https://opensearch:9200"]
insecure: true
username: admin
password: "admin_pass"
index_type: custom
index: "apache-logs-%{yyyy.MM.dd}"
You can test this pipeline using the following command:
curl -sS -X POST "http://localhost:2021/logs" \
-H "Content-Type: application/json" \
-d '[
{"message":"127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] \"GET /apache.gif HTTP/1.0\" 200 2326 \"http://www.example.com/start.html\" \"Mozilla/4.08 [en] (Win98; I ;Nav)\""},
{"message":"192.168.1.5 - - [13/Oct/2025:17:42:10 +0000] \"POST /login HTTP/1.1\" 302 512 \"-\" \"curl/8.5.0\""}
]'
The documents stored in OpenSearch contain the following information:
{
...
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "apache-logs-2025.10.13",
"_id": "gLO73pkBpMIC6s6zUMMX",
"_score": 1,
"_source": {
"message": "127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] \"GET /apache.gif HTTP/1.0\" 200 2326 \"http://www.example.com/start.html\" \"Mozilla/4.08 [en] (Win98; I ;Nav)\"",
"parsed": {
"request": "/apache.gif",
"referrer": "http://www.example.com/start.html",
"agent": "Mozilla/4.08 [en] (Win98; I ;Nav)",
"auth": "frank",
"ident": "-",
"response": "200",
"bytes": "2326",
"clientip": "127.0.0.1",
"verb": "GET",
"httpversion": "1.0",
"timestamp": "10/Oct/2000:13:55:36 -0700"
},
"@timestamp": "2000-10-10T20:55:36.000Z"
}
},
{
"_index": "apache-logs-2025.10.13",
"_id": "gbO73pkBpMIC6s6zUMMX",
"_score": 1,
"_source": {
"message": "192.168.1.5 - - [13/Oct/2025:17:42:10 +0000] \"POST /login HTTP/1.1\" 302 512 \"-\" \"curl/8.5.0\"",
"parsed": {
"request": "/login",
"referrer": "-",
"agent": "curl/8.5.0",
"auth": "-",
"ident": "-",
"response": "302",
"bytes": "512",
"clientip": "192.168.1.5",
"verb": "POST",
"httpversion": "1.1",
"timestamp": "13/Oct/2025:17:42:10 +0000"
},
"@timestamp": "2025-10-13T17:42:10.000Z"
}
}
]
}
}
Parse application logs with custom patterns
This example demonstrates parsing custom application logs with user-defined patterns for extracting structured data from proprietary log formats:
application-logs-pipeline:
source:
http:
path: /logs
ssl: false
processor:
- grok:
match:
message: ['%{TIMESTAMP_ISO8601:timestamp} \[%{LOGLEVEL:level}\] %{DATA:component} - %{GREEDYDATA:details}']
pattern_definitions:
LOGLEVEL: (?:INFO|WARN|ERROR|DEBUG|TRACE)
break_on_match: true
target_key: "parsed"
keep_empty_captures: false
- date:
match:
- key: "/parsed/timestamp"
patterns:
- "yyyy-MM-dd HH:mm:ss"
- "yyyy-MM-dd HH:mm:ss.SSS"
destination: "@timestamp" # you could also use "/@timestamp"
source_timezone: "UTC"
sink:
- opensearch:
hosts: ["https://opensearch:9200"]
insecure: true
username: admin
password: "admin_pass"
index_type: custom
index: "application-logs-%{yyyy.MM.dd}"
You can test this pipeline using the following command:
curl -sS -X POST "http://localhost:2021/logs" \
-H "Content-Type: application/json" \
-d '[
{"message": "2025-10-13 14:30:45 [INFO] UserService - User login successful"},
{"message": "2025-10-13 14:31:15 [ERROR] DatabaseConnection - Connection timeout"},
{"message": "2025-10-13 14:32:30 [DEBUG] CacheManager - Cache hit"},
{"message": "2025-10-13 14:33:05 [WARN] MetricsCollector - High memory usage detected"}
]'
The documents stored in OpenSearch contain the following information:
{
...
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "application-logs-2025.10.13",
"_id": "i7O83pkBpMIC6s6zhsNK",
"_score": 1,
"_source": {
"message": "2025-10-13 14:30:45 [INFO] UserService - User login successful",
"parsed": {
"component": "UserService",
"level": "INFO",
"details": "User login successful",
"timestamp": "2025-10-13 14:30:45"
},
"@timestamp": "2025-10-13T14:30:45.000Z"
}
},
{
"_index": "application-logs-2025.10.13",
"_id": "jLO83pkBpMIC6s6zhsNK",
"_score": 1,
"_source": {
"message": "2025-10-13 14:31:15 [ERROR] DatabaseConnection - Connection timeout",
"parsed": {
"component": "DatabaseConnection",
"level": "ERROR",
"details": "Connection timeout",
"timestamp": "2025-10-13 14:31:15"
},
"@timestamp": "2025-10-13T14:31:15.000Z"
}
},
{
"_index": "application-logs-2025.10.13",
"_id": "jbO83pkBpMIC6s6zhsNK",
"_score": 1,
"_source": {
"message": "2025-10-13 14:32:30 [DEBUG] CacheManager - Cache hit",
"parsed": {
"component": "CacheManager",
"level": "DEBUG",
"details": "Cache hit",
"timestamp": "2025-10-13 14:32:30"
},
"@timestamp": "2025-10-13T14:32:30.000Z"
}
},
{
"_index": "application-logs-2025.10.13",
"_id": "jrO83pkBpMIC6s6zhsNK",
"_score": 1,
"_source": {
"message": "2025-10-13 14:33:05 [WARN] MetricsCollector - High memory usage detected",
"parsed": {
"component": "MetricsCollector",
"level": "WARN",
"details": "High memory usage detected",
"timestamp": "2025-10-13 14:33:05"
},
"@timestamp": "2025-10-13T14:33:05.000Z"
}
}
]
}
}
Parse network device logs with multiple patterns
This example demonstrates using multiple grok patterns to handle different log formats from network devices, with conditional processing based on log type:
network-device-logs-pipeline:
source:
http:
path: /logs
ssl: false
processor:
- grok:
match:
message: [
# syslog-like
'%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:host} %{DATA:program}(?:\[%{POSINT:pid}\])?: %{GREEDYDATA:message}',
# ISO8601 + IP
'%{TIMESTAMP_ISO8601:timestamp} %{IP:host} %{DATA:program}: %{GREEDYDATA:message}',
# Cisco style
'%{CISCO_TIMESTAMP:timestamp}: %{DATA:facility}-%{INT:severity}-%{DATA:mnemonic}: %{GREEDYDATA:message}'
]
break_on_match: true
named_captures_only: true
pattern_definitions:
CISCO_TIMESTAMP: '%{MONTH} %{MONTHDAY} %{TIME}'
target_key: "parsed"
timeout_millis: 5000
# Extract login info from the parsed message text
- grok:
match:
/parsed/message: ['User %{USERNAME:user} logged in from %{IP:source_ip}']
break_on_match: true
target_key: "login_info"
- date:
match:
- key: "/parsed/timestamp"
patterns:
# syslog-like (no year)
- "MMM d HH:mm:ss"
# ISO8601 without/with millis & TZ
- "yyyy-MM-dd'T'HH:mm:ssXXX"
- "yyyy-MM-dd'T'HH:mm:ss.SSSXXX"
destination: "@timestamp"
source_timezone: "UTC"
sink:
- opensearch:
hosts: ["https://opensearch:9200"]
insecure: true
username: admin
password: "admin_pass"
index_type: custom
index: "network-logs-%{yyyy.MM.dd}"
You can test this pipeline using the following command:
curl -sS -X POST "http://localhost:2021/logs" \
-H "Content-Type: application/json" \
-d '[
{"message":"Oct 13 14:30:45 router1 sshd[1234]: User alice logged in from 10.0.0.5"},
{"message":"2025-10-13T16:01:22Z 192.168.0.10 dhcpd: Lease granted to 192.168.0.55"},
{"message":"Oct 13 16:30:45: LOCAL4-3-LINK_UPDOWN: Interface Gi0/1 changed state to up"}
]'
The documents stored in OpenSearch contain the following information:
{
...
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "network-logs-2025.10.13",
"_id": "-kzC3pkBl88jNjkRQ1TJ",
"_score": 1,
"_source": {
"message": "Oct 13 14:30:45 router1 sshd[1234]: User alice logged in from 10.0.0.5",
"parsed": {
"host": "router1",
"pid": "1234",
"program": "sshd",
"message": "User alice logged in from 10.0.0.5",
"timestamp": "Oct 13 14:30:45"
},
"login_info": {
"user": "alice",
"source_ip": "10.0.0.5"
},
"@timestamp": "2025-10-13T14:30:45.000Z"
}
},
{
"_index": "network-logs-2025.10.13",
"_id": "-0zC3pkBl88jNjkRQ1TJ",
"_score": 1,
"_source": {
"message": "2025-10-13T16:01:22Z 192.168.0.10 dhcpd: Lease granted to 192.168.0.55",
"parsed": {
"host": "192.168.0.10",
"program": "dhcpd",
"message": "Lease granted to 192.168.0.55",
"timestamp": "2025-10-13T16:01:22Z"
},
"login_info": {},
"@timestamp": "2025-10-13T16:01:22.000Z"
}
},
{
"_index": "network-logs-2025.10.13",
"_id": "_EzC3pkBl88jNjkRQ1TJ",
"_score": 1,
"_source": {
"message": "Oct 13 16:30:45: LOCAL4-3-LINK_UPDOWN: Interface Gi0/1 changed state to up",
"parsed": {
"severity": "3",
"mnemonic": "LINK_UPDOWN",
"message": "Interface Gi0/1 changed state to up",
"facility": "LOCAL4",
"timestamp": "Oct 13 16:30:45"
},
"login_info": {},
"@timestamp": "2025-10-13T16:30:45.000Z"
}
}
]
}
}
Metrics
The following table describes common Abstract processor metrics.
| Metric name | Type | Description |
|---|---|---|
recordsIn | Counter | Metric representing the ingress of records to a pipeline component. |
recordsOut | Counter | Metric representing the egress of records from a pipeline component. |
timeElapsed | Timer | Metric representing the time elapsed during execution of a pipeline component. |
The Grok processor includes the following custom metrics.
Counter
grokProcessingMismatch: Records the number of records that did not match any of the patterns specified in the match field.grokProcessingMatch: Records the number of records that matched at least one pattern from thematchfield.grokProcessingErrors: Records the total number of record processing errors.grokProcessingTimeouts: Records the total number of records that timed out while matching.
Timer
grokProcessingTime: The time taken by individual records to match againstmatchpatterns. Theavgmetric is the most useful metric for this timer because because it provides the average time taken to match records.