You're viewing version 2.19 of the OpenSearch documentation. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.
Star-tree field type
This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion on the OpenSearch forum.
A star-tree index precomputes aggregations, accelerating the performance of aggregation queries. If a star-tree index is configured as part of an index mapping, the star-tree index is created and maintained as data is ingested in real time.
OpenSearch will automatically use the star-tree index to optimize aggregations if the queried fields are part of star-tree index dimension fields and the aggregations are on star-tree index metric fields. No changes are required in the query syntax or the request parameters.
For more information, see Star-tree index.
Prerequisites
To use a star-tree index, follow the instructions in Enabling a star-tree index.
Examples
The following examples show how to use a star-tree index.
Star-tree index mappings
Define star-tree index mappings in the composite section in mappings.
The following example API request creates a corresponding star-tree index namedrequest_aggs. To compute metric aggregations for request_size and latency fields with queries on port and status fields, configure the following mappings:
PUT logs
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0,
"index.composite_index": true,
"index.append_only.enabled": true
},
"mappings": {
"composite": {
"request_aggs": {
"type": "star_tree",
"config": {
"max_leaf_docs": 10000,
"skip_star_node_creation_for_dimensions": [
"port"
],
"date_dimension" : {
"name": "@timestamp",
"calendar_intervals": [
"month",
"day"
]
},
"ordered_dimensions": [
{
"name": "status"
},
{
"name": "port"
},
{
"name": "method"
}
],
"metrics": [
{
"name": "request_size",
"stats": [
"sum",
"value_count",
"min",
"max"
]
},
{
"name": "latency",
"stats": [
"sum",
"value_count",
"min",
"max"
]
}
]
}
}
},
"properties": {
"@timestamp": {
"format": "strict_date_optional_time||epoch_second",
"type": "date"
},
"status": {
"type": "integer"
},
"port": {
"type": "integer"
},
"request_size": {
"type": "integer"
},
"method" : {
"type": "keyword"
},
"latency": {
"type": "scaled_float",
"scaling_factor": 10
}
}
}
}
Star-tree index configuration options
You can customize your star-tree implementation using the following config options in the mappings section. These options cannot be modified without reindexing.
| Parameter | Description |
|---|---|
ordered_dimensions | A list of fields based on which metrics will be aggregated in a star-tree index. Required. |
date_dimension | If the date dimension is provided, ordered_dimensions is appended to it based on which metrics will be aggregated in a star-tree index. Optional. |
metrics | A list of metric fields required in order to perform aggregations. Required. |
max_leaf_docs | The maximum number of star-tree documents that a leaf node can point to. After the maximum number of documents is reached, child nodes will be created based on the unique value of the next field in the ordered_dimension (if any). Default is 10000. A lower value will use more storage but result in faster query performance. Inversely, a higher value will use less storage but result in slower query performance. For more information, see Star-tree indexing structure. |
skip_star_node_creation_for_dimensions | A list of dimensions for which a star-tree index will skip star node creation. When true, this reduces storage size at the expense of query performance. Default is false. For more information about star nodes, see Star-tree indexing structure. |
Ordered dimensions
The ordered_dimensions parameter contains fields based on which metrics will be aggregated in a star-tree index. The star-tree index will be selected for querying only if all the fields in the query are part of the ordered_dimensions.
When using the ordered_dimesions parameter, follow these best practices:
- The order of dimensions matters. You can define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning.
- Avoid using high-cardinality fields as dimensions. High-cardinality fields adversely affect storage space, indexing throughput, and query performance.
- A minimum of
2and a maximum of10dimensions are supported per star-tree index.
The ordered_dimensions parameter supports the following field types:
- All numeric field types, excluding
unsigned_longandscaled_float keywordobject
Support for other field types, such as ip, will be added in future versions. For more information, see GitHub issue #13875.
The ordered_dimensions parameter supports the following property.
| Parameter | Required/Optional | Description |
|---|---|---|
name | Required | The name of the field. The field name should be present in the properties section as part of the index mapping. Ensure that the doc_values setting is enabled for any associated fields. |
Date dimension
The date_dimension supports one Date field and is always the first dimension placed above the ordered dimensions, as they generally have high cardinality.
The date_dimension can support up to three of the following calendar intervals:
year(of era)quarter(of year)month(of year)week(of week-based year)day(of month)hour(of day)half-hour(of day)quater-hour(of day)minute(of hour)second(of minute)
Any values in the date field are rounded based on the granularity associated with the calendar intervals provided. For example:
- The default
calendar_intervalsareminuteandhalf-hour. - During queries, the nearest granular intervals are automatically picked up. For example, if you have configured
hourandminuteas thecalendar_intervalsand your query is a monthly date histogram, thehourinterval will be automatically selected so that the query computes the results in an optimized way. - To support time-zone-based queries,
:30equals ahalf-hourinterval and:15equals aquarter-hourinterval.
Metrics
Configure any metric fields on which you need to perform aggregations. Metrics are required as part of a star-tree index configuration.
When using metrics, follow these best practices:
- Currently, fields supported by
metricsare all numeric field types, with the exception ofunsigned_long. For more information, see GitHub issue #15231. - Supported metric aggregations include
Min,Max,Sum,Avg, andValue_count.Avgis a derived metric based onSumandValue_countand is not indexed when a query is run. The remaining base metrics are indexed.
- A maximum of
100base metrics are supported per star-tree index.
If Min, Max, Sum, and Value_count are defined as metrics for each field, then up to 25 such fields can be configured, as shown in the following example:
{
"metrics": [
{
"name": "field1",
"stats": [
"sum",
"value_count",
"min",
"max"
],
...,
...,
"name": "field25",
"stats": [
"sum",
"value_count",
"min",
"max"
]
}
]
}
Properties
The metrics parameter supports the following properties.
| Parameter | Required/Optional | Description |
|---|---|---|
name | Required | The name of the field. The field name should be present in the properties section as part of the index mapping. Ensure that the doc_values setting is enabled for any associated fields. |
stats | Optional | A list of metric aggregations computed for each field. You can choose between Min, Max, Sum, Avg, and Value Count.Default is Sum and Value_count.Avg is a derived metric statistic that will automatically be supported in queries if Sum and Value_Count are present as part of metric stats. |
Supported queries and aggregations
For more information about supported queries and aggregations, see Supported queries and aggregations for a star-tree index.