Regular expression syntax
A regular expression (regex) is a way to define search patterns using special symbols and operators. These patterns let you match sequences of characters in strings.
In OpenSearch, you can use regular expressions in the following query types:
OpenSearch uses the Apache Lucene regex engine, which has its own syntax and limitations. It does not use Perl Compatible Regular Expressions (PCRE), so some familiar regex features might behave differently or be unsupported.
Choosing between regexp and query_string queries
Both regexp
and query_string
queries support regular expressions, but they behave differently and serve different use cases.
Feature | regexp query | query_string query |
---|---|---|
Pattern matching | Regex pattern must match the entire field value | Regex pattern can match any part of the field |
flags support | flags enables optional regex operators | flags not supported |
Query type | Term-level query (not scored) | Full-text query (scored and parsed) |
Best use case | Strict pattern matching on keyword or exact fields | Search within analyzed fields using a flexible query string that supports regex patterns |
Complex query composition | Limited to regex patterns | Supports AND , OR , wildcards, fields, boosts, and other features. See Query string query. |
Reserved characters
Lucene’s regex engine supports all Unicode characters. However, the following characters are treated as special operators:
. ? + * | { } [ ] ( ) " \
Depending on the enabled flags
that specify optional operators, the following characters may also be reserved:
@ & ~ < >
To match these characters literally, either escape them with a backslash (\
) or wrap the entire string in double quotation marks:
\&
: Matches a literal&
\\
: Matches a literal backslash (\
)"hello@world"
: Matches the full stringhello@world
Standard regex operators
Lucene supports a core set of regex operators:
-
.
– Matches any single character. Example:f.n
matchesf
followed by any character and thenn
(for example,fan
orfin
). -
?
– Matches zero or one of the preceding characters. Example:colou?r
matchescolor
andcolour
. -
+
– Matches one or more of the preceding characters. Example:go+
matchesg
followed by one or moreo
s (go
,goo
,gooo
, and so on). -
*
– Matches zero or more of the preceding characters. Example:lo*se
matchesl
followed by zero or moreo
s and thense
(lse
,lose
,loose
,loooose
, and so on). -
{min,max}
– Matches a specific range of repetitions. Ifmax
is omitted, there is no upper limit on the number of characters matched. Example:x{3}
matches exactly 3x
s (xxx
);x{2,4}
matches from 2 to 4x
s (xx
,xxx
, orxxxx
);x{3,}
matches 3 or morex
s (xxx
,xxxx
,xxxxx
, and so on). -
|
– Acts as a logicalOR
. Example:apple|orange
matchesapple
ororange
. -
( )
– Groups characters into a subpattern. Example:ab(cd)?
matchesab
andabcd
. -
[ ]
– Matches one character from a set or range. Example:[aeiou]
matches any vowel.-
– When provided within the brackets, indicates a range unless escaped or is the first character within the brackets. Example:[a-z]
matches any lowercase letter;[-az]
matches-
,a
, orz
;[a\\-z]
matchesa
,-
, orz
.^
– When provided within the brackets, acts a logicalNOT
, negating a range of characters or any character in the set. Example:[^az]
matches any character excepta
orz
;[^a-z]
matches any character except lowercase letters;[^-az]
matches any character except-
,a
, andz
;[^a\\-z]
matches any character excepta
,-
, andz
.
Optional operators
You can enable additional regex operators using the flags
parameter. Separate multiple flags with |
.
The following are the available flags:
-
ALL
(default) – Enables all optional operators. -
COMPLEMENT
– Enables~
, which negates the shortest following expression. Example:d~ef
matchesdgf
,dxf
, but notdef
. -
INTERSECTION
– Enables&
as anAND
logical operator. Example:ab.+&.+cd
matches strings containingab
at the beginning andcd
at the end. -
INTERVAL
– Enables<min-max>
syntax to match numeric ranges. Example:id<10-12>
matchesid10
,id11
, andid12
. -
ANYSTRING
– Enables@
to match any string. You can combine this with~
and&
for exclusions. Example:@&.*error.*&.*[0-9]{3}.*
matches strings containing both the word “error” and a sequence of three digits.
Unsupported features
Lucene’s engine does not support the following commonly used regex anchors:
^
– Start of line$
– End of line
Instead, your pattern must match the entire string to produce a match.
Example
To try regular expressions, index the following documents into the logs
index:
PUT /logs/_doc/1
{
"message": "error404"
}
PUT /logs/_doc/2
{
"message": "error500"
}
PUT /logs/_doc/3
{
"message": "error1a"
}
Example: Basic query containing regular expressions
The following regexp
query returns documents in which the entire value of the message
field matches the pattern “error” followed by one or more digits. A value does not match if it only contains the pattern as a substring:
GET /logs/_search
{
"query": {
"regexp": {
"message": {
"value": "error[0-9]+"
}
}
}
}
This query matches error404
and error500
:
{
"took": 28,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "logs",
"_id": "1",
"_score": 1,
"_source": {
"message": "error404"
}
},
{
"_index": "logs",
"_id": "2",
"_score": 1,
"_source": {
"message": "error500"
}
}
]
}
}
Example: Using optional operators
The following query matches documents in which the message
field exactly matches a string that starts with “error” followed by a number from 400 to 500, inclusive. The INTERVAL
flag enables the use of <min-max>
syntax for numeric ranges:
GET /logs/_search
{
"query": {
"regexp": {
"message": {
"value": "error<400-500>",
"flags": "INTERVAL"
}
}
}
}
This query matches error404
and error500
:
{
"took": 22,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "logs",
"_id": "1",
"_score": 1,
"_source": {
"message": "error404"
}
},
{
"_index": "logs",
"_id": "2",
"_score": 1,
"_source": {
"message": "error500"
}
}
]
}
}
Example: Using ANYSTRING
When the ANYSTRING
flag is enabled, the @
operator matches an entire string. This is useful when combined with intersection (&
) because it allows you to construct queries that match full strings under specific conditions.
The following query matches messages that contain both the word “error” and a sequence of three digits. Use ANYSTRING
to assert that the entire field must match the intersection of both patterns:
GET /logs/_search
{
"query": {
"regexp": {
"message.keyword": {
"value": "@&.*error.*&.*[0-9]{3}.*",
"flags": "ANYSTRING|INTERSECTION"
}
}
}
}
This query matches error404
and error500
:
{
"took": 20,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "logs",
"_id": "1",
"_score": 1,
"_source": {
"message": "error404"
}
},
{
"_index": "logs",
"_id": "2",
"_score": 1,
"_source": {
"message": "error500"
}
}
]
}
}
Note that this query will also match xerror500
, error500x
, and errorxx500
.