You're viewing version 3.0 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.
Matrix stats aggregations
The matrix_stats
aggregation is a multi-value metric aggregation that generates covariance statistics for two or more fields in matrix form.
The matrix_stats
aggregation does not support scripting.
Parameters
The matrix_stats
aggregation takes the following parameters.
Parameter | Required/Optional | Data type | Description |
---|---|---|---|
fields | Required | String | An array of fields for which the matrix stats are computed. |
missing | Optional | Object | The value to use in place of missing values. By default, missing values are ignored. See Missing values. |
mode | Optional | String | The value to use as a sample from a multi-valued or array field. Allowed values are avg , min , max , sum , and median . Default is avg . |
Example
The following example returns statistics for the taxful_total_price
and products.base_price
fields in the OpenSearch Dashboards e-commerce sample data:
GET opensearch_dashboards_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"matrix_stats_taxful_total_price": {
"matrix_stats": {
"fields": ["taxful_total_price", "products.base_price"]
}
}
}
}
The response containes the aggregated results:
{
"took": 250,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4675,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"matrix_stats_taxful_total_price": {
"doc_count": 4675,
"fields": [
{
"name": "products.base_price",
"count": 4675,
"mean": 34.99423943014724,
"variance": 360.5035285833702,
"skewness": 5.530161335032689,
"kurtosis": 131.1630632404217,
"covariance": {
"products.base_price": 360.5035285833702,
"taxful_total_price": 846.6489362233169
},
"correlation": {
"products.base_price": 1,
"taxful_total_price": 0.8444765264325269
}
},
{
"name": "taxful_total_price",
"count": 4675,
"mean": 75.05542864304839,
"variance": 2788.1879749835425,
"skewness": 15.812149139923994,
"kurtosis": 619.1235507385886,
"covariance": {
"products.base_price": 846.6489362233169,
"taxful_total_price": 2788.1879749835425
},
"correlation": {
"products.base_price": 0.8444765264325269,
"taxful_total_price": 1
}
}
]
}
}
}
The following table describes the response fields.
Statistic | Description |
---|---|
count | The number of documents sampled for the aggregation. |
mean | The average value of the field computed from the sample. |
variance | The square of deviation from the mean, a measure of data spread. |
skewness | A measure of the distribution’s asymmetry relative to the mean. See Skewness. |
kurtosis | A measure of the tail-heaviness of a distribution. As the tails become lighter, kurtosis decreases. Kurtosis and skewness are evaluated to determine whether a population is likely to be normally distributed. See Kurtosis. |
covariance | A measure of the joint variability between two fields. A positive value means their values move in the same direction. |
correlation | The normalized covariance, a measure of the strength of the relationship between two fields. Possible values are from -1 to 1, inclusive, indicating perfect negative to perfect positive linear correlation. A value of 0 indicates no discernible relationship between the variables. |
Missing values
To define how missing values are treated, use the missing
parameter. By default, missing values are ignored.
For example, create an index in which document 1 is missing the gpa
and class_grades
fields:
POST _bulk
{ "create": { "_index": "students", "_id": "1" } }
{ "name": "John Doe" }
{ "create": { "_index": "students", "_id": "2" } }
{ "name": "Jonathan Powers", "gpa": 3.85, "class_grades": [3.0, 3.9, 4.0] }
{ "create": { "_index": "students", "_id": "3" } }
{ "name": "Jane Doe", "gpa": 3.52, "class_grades": [3.2, 2.1, 3.8] }
First, run a matrix_stats
aggregation without providing a missing
parameter:
GET students/_search
{
"size": 0,
"aggs": {
"matrix_stats_taxful_total_price": {
"matrix_stats": {
"fields": [
"gpa",
"class_grades"
],
"mode": "avg"
}
}
}
}
OpenSearch ignores missing values when calculating the matrix statistics:
{
"took": 5,
"timed_out": false,
"terminated_early": true,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"matrix_stats_taxful_total_price": {
"doc_count": 2,
"fields": [
{
"name": "gpa",
"count": 2,
"mean": 3.684999942779541,
"variance": 0.05444997482300096,
"skewness": 0,
"kurtosis": 1,
"covariance": {
"gpa": 0.05444997482300096,
"class_grades": 0.09899998760223136
},
"correlation": {
"gpa": 1,
"class_grades": 0.9999999999999991
}
},
{
"name": "class_grades",
"count": 2,
"mean": 3.333333333333333,
"variance": 0.1800000381469746,
"skewness": 0,
"kurtosis": 1,
"covariance": {
"gpa": 0.09899998760223136,
"class_grades": 0.1800000381469746
},
"correlation": {
"gpa": 0.9999999999999991,
"class_grades": 1
}
}
]
}
}
}
To set the missing fields to 0
, provide the missing
parameter as a key-value map. Even though class_grades
is an array field, the matrix_stats
aggregation flattens multi-valued numeric fields into a per-document average, so you must supply a single number as the missing value:
GET students/_search
{
"size": 0,
"aggs": {
"matrix_stats_taxful_total_price": {
"matrix_stats": {
"fields": ["gpa", "class_grades"],
"mode": "avg",
"missing": {
"gpa": 0,
"class_grades": 0
}
}
}
}
}
OpenSearch substitutes 0
for any missing gpa
or class_grades
values when calculating the matrix statistics:
{
"took": 23,
"timed_out": false,
"terminated_early": true,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"matrix_stats_taxful_total_price": {
"doc_count": 3,
"fields": [
{
"name": "gpa",
"count": 3,
"mean": 2.456666628519694,
"variance": 4.55363318017324,
"skewness": -0.688130006360758,
"kurtosis": 1.5,
"covariance": {
"gpa": 4.55363318017324,
"class_grades": 4.143944374667273
},
"correlation": {
"gpa": 1,
"class_grades": 0.9970184390038257
}
},
{
"name": "class_grades",
"count": 3,
"mean": 2.2222222222222223,
"variance": 3.793703722777191,
"skewness": -0.6323693521730989,
"kurtosis": 1.5000000000000002,
"covariance": {
"gpa": 4.143944374667273,
"class_grades": 3.793703722777191
},
"correlation": {
"gpa": 0.9970184390038257,
"class_grades": 1
}
}
]
}
}
}