Star-tree index
A star-tree index is a specialized index structure designed to improve aggregation performance by precomputing and storing aggregated values at different levels of granularity. This indexing technique enables faster aggregation execution, especially for multi-field aggregations.
Once you enable star-tree indexes, OpenSearch automatically builds and uses star-tree indexes to optimize supported aggregations if the filter fields match the defined dimensions and the aggregation fields match the defined metrics in the star-tree mapping configuration. No changes to your query syntax or request parameters are required.
Use a star-tree index when you want to speed up aggregations:
- Star-tree indexes natively support multi-field aggregations.
- Star-tree indexes are created in real time as part of the indexing process, so the data in a star-tree is always current.
- A star-tree index aggregates data to improve paging efficiency and reduce disk I/O during search queries.
Star-tree index structure
A star-tree index organizes and aggregates data across combinations of dimension fields and precomputes metric values for all the dimension combinations every time a segment is flushed or refreshed during ingestion. This structure enables OpenSearch to process aggregation queries quickly without scanning every document.
The following is an example star-tree configuration:
"ordered_dimensions": [
{
"name": "status"
},
{
"name": "port"
}
],
"metrics": [
{
"name": "size",
"stats": [
"sum"
]
},
{
"name": "latency",
"stats": [
"avg"
]
}
]
This configuration defines the following:
- Two dimension fields:
status
andport
. Theordered_dimension
field specifies how data is sorted (first bystatus
, then byport
). - Two metric fields:
size
andlatency
with their corresponding aggregations (sum
andavg
). For each unique dimension combination, metric values (Sum(size)
andAvg(latency)
) are pre-aggregated and stored in the star-tree structure.
OpenSearch creates a star-tree index structure based on this configuration. Each node in the tree corresponds to a value (or wildcard *
) for a dimension. At query time, OpenSearch traverses the tree based on the dimension values provided in the query.
Leaf nodes
Leaf nodes contain the precomputed metric aggregations for specific combinations of dimensions. These are stored as doc values and referenced by star-tree nodes.
The max_leaf_docs
setting controls how many documents each leaf node can reference, which helps keep query latency predictable by limiting how many documents are scanned for any given node.
Star nodes
A star node (marked as *
in the following diagram) aggregates all values for a particular dimension. If a query doesn’t specify a filter for that dimension, OpenSearch retrieves the precomputed aggregation from the star node instead of iterating over multiple leaf nodes. For example, if a query filters on port
but not status
, OpenSearch can use a star node that aggregates data for all status values.
How queries use the star-tree
The following diagram shows a star-tree index created for this example and three example query paths. In the diagram, notice that each branch corresponds to a dimension (status
and port
). Some nodes contain precomputed aggregation values (for example, Sum(size)
), allowing OpenSearch to skip unnecessary calculations at query time.
The colored arrows show three query examples:
-
Blue arrow: Multi-term query with metric aggregation The query filters on both
status = 200
andport = 5600
and calculates the sum of request sizes.- OpenSearch follows this path:
Root → 200 → 5600
- It retrieves the metric from Doc ID 1, where
Sum(size) = 988
- OpenSearch follows this path:
-
Green arrow: Single-term query with metric aggregation The query filters on
status = 200
only and computes the average request latency.- OpenSearch follows this path:
Root → 200 → *
- It retrieves the metric from Doc ID 5, where
Avg(latency) = 70
- OpenSearch follows this path:
-
Red arrow: Single-term query with metric aggregation The query filters on
port = 8443
only and calculates the sum of request sizes.- OpenSearch follows this path:
Root → * → 8443
- It retrieves the metric from Doc ID 7, where
Sum(size) = 1111
- OpenSearch follows this path:
These examples show how OpenSearch selects the shortest path in the star-tree and uses pre-aggregated values to process queries efficiently.
Limitations
Note the following limiations of star-tree indexes:
- Star-tree indexes do not support updates or deletions. To use a star-tree index, data should be append-only. See Enabling a star-tree index.
- A star-tree index only works for aggregation queries that filter on dimension fields and aggregate metric fields defined in the index’s star-tree configuration.
- Any changes to a star-tree configuration require reindexing.
- Array values are not supported.
- Only specific queries and aggregations are supported.
- Avoid using high-cardinality fields like
_id
as dimensions because they can significantly increase storage use and query latency.
Enabling a star-tree index
Star-tree indexing behavior is controlled by the following cluster-level and index-level settings. Index-level settings take precedence over cluster settings.
Setting | Scope | Default | Purpose |
---|---|---|---|
indices.composite_index.star_tree.enabled | Cluster | true | Enables or disables star-tree search optimization across the cluster. |
index.composite_index | Index | None | Enables star-tree indexing for a specific index. Must be set when creating the index. |
index.append_only.enabled | Index | None | Required for star-tree indexes. Prevents updates and deletions. Must be true . |
index.search.star_tree_index.enabled | Index | true | Enables or disables use of the star-tree index for search queries on the index. |
Setting indices.composite_index.star_tree.enabled
to false
prevents OpenSearch from using star-tree optimization during searches, but the star-tree index structures are still created. To completely remove star-tree structures, you must reindex your data without the star-tree mapping.
To create an index that uses a star-tree index, send the following request:
PUT /logs
{
"settings": {
"index.composite_index": true,
"index.append_only.enabled": true
}
}
Ensure that the doc_values
parameter is enabled for the dimension and metric fields used in your star-tree mapping. This is enabled by default for most field types. For more information, see Doc values.
Disabling star-tree usage
By default, both the indices.composite_index.star_tree.enabled
cluster setting and the index.search.star_tree_index.enabled
index setting are set to true
. To disable search using star-tree indexes, set both of these settings to false
. Note that index settings take precedence over cluster settings.
Example mapping
The following example shows how to create a star-tree index that precomputes aggregations in the logs
index. The sum
and average
aggregations are calculated on the size
and latency
fields , respectively, for all combinations of values in the dimension fields. The dimensions are ordered by status
, then port
, and finally method
, which determines how the data is organized in the tree structure:
PUT /logs
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0,
"index.composite_index": true,
"index.append_only.enabled": true
},
"mappings": {
"composite": {
"request_aggs": {
"type": "star_tree",
"config": {
"date_dimension" : {
"name": "@timestamp",
"calendar_intervals": [
"month",
"day"
]
},
"ordered_dimensions": [
{
"name": "status"
},
{
"name": "port"
},
{
"name": "method"
}
],
"metrics": [
{
"name": "size",
"stats": [
"sum"
]
},
{
"name": "latency",
"stats": [
"avg"
]
}
]
}
}
},
"properties": {
"status": {
"type": "integer"
},
"port": {
"type": "integer"
},
"size": {
"type": "integer"
},
"method" : {
"type": "keyword"
},
"latency": {
"type": "scaled_float",
"scaling_factor": 10
}
}
}
}
For more information about star-tree index mappings and parameters, see Star-tree field type.
Supported queries and aggregations
Star-tree indexes optimize aggregations. Every query must include at least one supported aggregation in order to use the star-tree optimization.
Supported queries
Queries without aggregations cannot use star-tree optimization. The query’s fields must be present in the ordered_dimensions
section of the star-tree configuration. The following queries are supported:
Boolean query restrictions
Boolean queries in star-tree indexes follow specific rules for each clause type:
must
andfilter
clauses:- Are both supported and treated the same way because
filter
does not affect scoring. - Can operate across different dimensions.
- Allow only one condition per dimension across all
must
/filter
clauses, including nested ones. - Support term, terms, and range queries.
- Are both supported and treated the same way because
should
clauses:- Must operate on the same dimension and cannot operate across different dimensions
- Can only use term, terms, and range queries.
should
clauses insidemust
clauses:- Act as a required condition.
- When operating on the same dimension as outer
must
: The union ofshould
conditions is intersected with the outermust
conditions. - When operating on a different dimension: Processed normally as a required condition.
must_not
clauses are not supported.- Queries with the
minimum_should_match
parameter are not supported.
The following Boolean query is supported because it follows these restrictions:
{
"bool": {
"must": [
{"term": {"method": "GET"}}
],
"filter": [
{"range": {"status": {"gte": 200, "lt": 300}}}
],
"should": [
{"term": {"port": 443}},
{"term": {"port": 8443}}
]
}
}
The following Boolean queries are not supported because they violate these restrictions:
{
"bool": {
"should": [
{"term": {"status": 200}},
{"term": {"method": "GET"}} // SHOULD across different dimensions
]
}
}
{
"bool": {
"must": [
{"term": {"status": 200}}
],
"must_not": [ // MUST_NOT not supported
{"term": {"method": "DELETE"}}
]
}
}
Supported aggregations
The following aggregations are supported by star-tree indexes.
Metric aggregations
The following metric aggregations are supported:
To use searchable aggregations with a star-tree index, make sure you fulfill the following prerequisites:
- The fields must be present in the
metrics
section of the star-tree configuration. - The metric aggregation type must be part of the
stats
parameter.
The following example gets the sum of all the values in the size
field for all error logs with status=500
, using the example mapping:
POST /logs/_search
{
"query": {
"term": {
"status": "500"
}
},
"aggs": {
"sum_size": {
"sum": {
"field": "size"
}
}
}
}
Using a star-tree index, the result will be retrieved from a single aggregated document as it traverses the status=500
node, as opposed to scanning through all of the matching documents. This results in lower query latency.
Date histograms with metric aggregations
You can use date histograms on calendar intervals with metric sub-aggregations.
To use date histogram aggregations and make them searchable in a star-tree index, remember the following requirements:
- The calendar intervals in a star-tree mapping configuration can use either the request’s calendar field or a field of lower granularity than the request field. For example, if an aggregation uses the
month
field, the star-tree search can still use lower-granularity fields such asday
. - A metric sub-aggregation must be part of the aggregation request.
The following example filters logs to include only those with status codes between 200
and 400
and sets the size
of the response to 0
, so that only aggregated results are returned. It then aggregates the filtered logs by calendar month and calculates the total size
of the requests for each month:
POST /logs/_search
{
"size": 0,
"query": {
"range": {
"status": {
"gte": "200",
"lte": "400"
}
}
},
"aggs": {
"by_month": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "month"
},
"aggs": {
"sum_size": {
"sum": {
"field": "size"
}
}
}
}
}
}
Keyword and numeric terms aggregations
You can use terms aggregations on both keyword and numeric fields with star-tree index search.
For star-tree search compatibility with terms aggregations, remember the following behaviors:
- The fields used in the terms aggregation should be part of the dimensions defined in the star-tree index.
- Metric sub-aggregations are optional as long as the relevant metrics are part of the star-tree configuration.
The following example aggregates logs by the user_id
field and returns the counts for each unique user:
POST /logs/_search
{
"size": 0,
"aggs": {
"users": {
"terms": {
"field": "user_id"
}
}
}
}
The following example aggregates orders by the order_quantity
and calculates the average total_price
for each quantity:
POST /orders/_search
{
"size": 0,
"aggs": {
"quantities": {
"terms": {
"field": "order_quantity"
},
"aggs": {
"avg_total_price": {
"avg": {
"field": "total_price"
}
}
}
}
}
}
Range aggregations
You can use range aggregations on numeric fields with star-tree index search.
For range aggregations to work effectively with a star-tree index, remember the following behaviors:
- The field used in the range aggregation should be part of the dimensions defined in the star-tree index.
- You can include metric sub-aggregations to compute metrics within each defined range, as long as the relevant metrics are part of the star-tree configuration.
The following example aggregates documents based on predefined ranges of the temperature
field:
POST /sensors/_search
{
"size": 0,
"aggs": {
"temperature_ranges": {
"range": {
"field": "temperature",
"ranges": [
{ "to": 20 },
{ "from": 20, "to": 30 },
{ "from": 30 }
]
}
}
}
}
The following example aggregates sales data by price ranges and calculates the total quantity
sold within each range:
POST /sales/_search
{
"size": 0,
"aggs": {
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{ "to": 100 },
{ "from": 100, "to": 500 },
{ "from": 500 }
]
},
"aggs": {
"total_quantity": {
"sum": {
"field": "quantity"
}
}
}
}
}
}
Nested aggregations
You can combine multiple supported bucket aggregations (such as terms
and range
) in a nested structure, and the star-tree index will optimize these nested aggregations. For more information about nested aggregations, see Nested aggregations.