You're viewing version 3.0 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.
Star-tree field type
This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion on the OpenSearch forum.
A star-tree index precomputes aggregations, accelerating the performance of aggregation queries. If a star-tree index is configured as part of an index mapping, the star-tree index is created and maintained as data is ingested in real time.
OpenSearch will automatically use the star-tree index to optimize aggregations if the queried fields are part of star-tree index dimension fields and the aggregations are on star-tree index metric fields. No changes are required in the query syntax or the request parameters.
For more information, see Star-tree index.
Prerequisites
To use a star-tree index, follow the instructions in Enabling a star-tree index.
Examples
The following examples show how to use a star-tree index.
Star-tree index mappings
Define star-tree index mappings in the composite section in mappings.
The following example API request creates a corresponding star-tree index namedrequest_aggs. To compute metric aggregations for request_size and latency fields with queries on port and status fields, configure the following mappings:
PUT logs
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0,
"index.composite_index": true,
"index.append_only.enabled": true
},
"mappings": {
"composite": {
"request_aggs": {
"type": "star_tree",
"config": {
"max_leaf_docs": 10000,
"skip_star_node_creation_for_dimensions": [
"port"
],
"date_dimension" : {
"name": "@timestamp",
"calendar_intervals": [
"month",
"day"
]
},
"ordered_dimensions": [
{
"name": "status"
},
{
"name": "port"
},
{
"name": "method"
}
],
"metrics": [
{
"name": "request_size",
"stats": [
"sum",
"value_count",
"min",
"max"
]
},
{
"name": "latency",
"stats": [
"sum",
"value_count",
"min",
"max"
]
}
]
}
}
},
"properties": {
"@timestamp": {
"format": "strict_date_optional_time||epoch_second",
"type": "date"
},
"status": {
"type": "integer"
},
"port": {
"type": "integer"
},
"request_size": {
"type": "integer"
},
"method" : {
"type": "keyword"
},
"latency": {
"type": "scaled_float",
"scaling_factor": 10
}
}
}
}
Star-tree index configuration options
You can customize your star-tree implementation using the following config options in the mappings section. These options cannot be modified without reindexing.
| Parameter | Description |
|---|---|
ordered_dimensions | A list of fields based on which metrics will be aggregated in a star-tree index. Required. |
date_dimension | If the date dimension is provided, ordered_dimensions is appended to it based on which metrics will be aggregated in a star-tree index. Optional. |
metrics | A list of metric fields required in order to perform aggregations. Required. |
max_leaf_docs | The maximum number of star-tree documents that a leaf node can point to. After the maximum number of documents is reached, child nodes will be created based on the unique value of the next field in the ordered_dimension (if any). Default is 10000. A lower value will use more storage but result in faster query performance. Inversely, a higher value will use less storage but result in slower query performance. For more information, see Star-tree indexing structure. |
skip_star_node_creation_for_dimensions | A list of dimensions for which a star-tree index will skip star node creation. When true, this reduces storage size at the expense of query performance. Default is false. For more information about star nodes, see Star-tree indexing structure. |
Ordered dimensions
The ordered_dimensions parameter contains fields based on which metrics will be aggregated in a star-tree index. The star-tree index will be selected for querying only if all the fields in the query are part of the ordered_dimensions.
When using the ordered_dimesions parameter, follow these best practices:
- The order of dimensions matters. You can define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning.
- Avoid using high-cardinality fields as dimensions. High-cardinality fields adversely affect storage space, indexing throughput, and query performance.
- A minimum of
2and a maximum of10dimensions are supported per star-tree index.
The ordered_dimensions parameter supports the following field types:
- All numeric field types, excluding
unsigned_longandscaled_float keywordobject
Support for other field types, such as ip, will be added in future versions. For more information, see GitHub issue #13875.
The ordered_dimensions parameter supports the following property.
| Parameter | Required/Optional | Description |
|---|---|---|
name | Required | The name of the field. The field name should be present in the properties section as part of the index mapping. Ensure that the doc_values setting is enabled for any associated fields. |
Date dimension
The date_dimension supports one Date field and is always the first dimension placed above the ordered dimensions, as they generally have high cardinality.
The date_dimension can support up to three of the following calendar intervals:
year(of era)quarter(of year)month(of year)week(of week-based year)day(of month)hour(of day)half-hour(of day)quater-hour(of day)minute(of hour)second(of minute)
Any values in the date field are rounded based on the granularity associated with the calendar intervals provided. For example:
- The default
calendar_intervalsareminuteandhalf-hour. - During queries, the nearest granular intervals are automatically picked up. For example, if you have configured
hourandminuteas thecalendar_intervalsand your query is a monthly date histogram, thehourinterval will be automatically selected so that the query computes the results in an optimized way. - To support time-zone-based queries,
:30equals ahalf-hourinterval and:15equals aquarter-hourinterval.
Metrics
Configure any metric fields on which you need to perform aggregations. Metrics are required as part of a star-tree index configuration.
When using metrics, follow these best practices:
- Currently, fields supported by
metricsare all numeric field types, with the exception ofunsigned_long. For more information, see GitHub issue #15231. - Supported metric aggregations include
Min,Max,Sum,Avg, andValue_count.Avgis a derived metric based onSumandValue_countand is not indexed when a query is run. The remaining base metrics are indexed.
- A maximum of
100base metrics are supported per star-tree index.
If Min, Max, Sum, and Value_count are defined as metrics for each field, then up to 25 such fields can be configured, as shown in the following example:
{
"metrics": [
{
"name": "field1",
"stats": [
"sum",
"value_count",
"min",
"max"
],
...,
...,
"name": "field25",
"stats": [
"sum",
"value_count",
"min",
"max"
]
}
]
}
Properties
The metrics parameter supports the following properties.
| Parameter | Required/Optional | Description |
|---|---|---|
name | Required | The name of the field. The field name should be present in the properties section as part of the index mapping. Ensure that the doc_values setting is enabled for any associated fields. |
stats | Optional | A list of metric aggregations computed for each field. You can choose between Min, Max, Sum, Avg, and Value Count.Default is Sum and Value_count.Avg is a derived metric statistic that will automatically be supported in queries if Sum and Value_Count are present as part of metric stats. |
Supported queries and aggregations
For more information about supported queries and aggregations, see Supported queries and aggregations for a star-tree index.