Weighted average aggregations
The weighted_avg aggregation calculates the weighted average of numeric values across documents. This is useful when you want to calculate an average but weight some data points more heavily than others.
The weighted average is calculated using the formula \(\frac{\sum_{i=1}^n \text{value}_i \cdot \text{weight}_i}{\sum_{i=1}^n \text{weight}_i}\).
Parameters
The weighted_avg aggregation takes the following parameters.
| Parameter | Required/Optional | Description |
|---|---|---|
value | Required | Defines how to obtain the numeric values to average. Requires a field or script. |
weight | Required | Defines how to obtain the weight for each value. Requires a field or script. |
format | Optional | A DecimalFormat formatting string. Returns the formatted output in the aggregation’s value_as_string property. |
value_type | Optional | A type hint for the values when using scripts or unmapped fields. |
You can specify the following parameters within value or weight.
| Parameter | Required/Optional | Description |
|---|---|---|
field | Optional | The document field to use for the value or weight. |
missing | Optional | A default value or weight to use when the field is missing. See Missing values. |
Example
First, create an index and index some data. Notice that Product C is missing the rating and num_reviews fields:
POST _bulk
{ "index": { "_index": "products" } }
{ "name": "Product A", "rating": 4.5, "num_reviews": 100 }
{ "index": { "_index": "products" } }
{ "name": "Product B", "rating": 3.8, "num_reviews": 50 }
{ "index": { "_index": "products" } }
{ "name": "Product C"}
The following request uses the weighted_avg aggregation to calculate a weighted average product rating. In this context, each product’s rating is weighted by its num_reviews. This means that products with more reviews will have a greater influence on the final average than those with fewer reviews:
GET /products/_search
{
"size": 0,
"aggs": {
"weighted_rating": {
"weighted_avg": {
"value": {
"field": "rating"
},
"weight": {
"field": "num_reviews"
},
"format": "#.##"
}
}
}
}
Example response
The response contains the weighted_rating, calculated as weighted_avg = (4.5 * 100 + 3.8 * 50) / (100 + 50) = 4.27. Only documents 1 and 2, which contain values for both rating and num_reviews, are considered:
{
"took": 18,
"timed_out": false,
"terminated_early": true,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"weighted_rating": {
"value": 4.266666650772095,
"value_as_string": "4.27"
}
}
}
Missing values
The missing parameter allows you to specify default values for documents missing the value field or the weight field instead of excluding them from the calculation.
For example, you can assign products without ratings an “average” rating of 3.0 and set the num_reviews to 1 to give them a small non-zero weight:
GET /products/_search
{
"size": 0,
"aggs": {
"weighted_rating": {
"weighted_avg": {
"value": {
"field": "rating",
"missing": 3.0
},
"weight": {
"field": "num_reviews",
"missing": 1
},
"format": "#.##"
}
}
}
}
The new weighted average is calculated as weighted_avg = (4.5 * 100 + 3.8 * 50 + 3.0 * 1) / (100 + 50 + 1) = 4.26:
{
"took": 27,
"timed_out": false,
"terminated_early": true,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"weighted_rating": {
"value": 4.258278129906055,
"value_as_string": "4.26"
}
}
}