Link Search Menu Expand Document Documentation Menu

Median absolute deviation aggregations

The median_absolute_deviation aggregation is a single-value metric aggregation. Median absolute deviation is a variability metric that measures dispersion from the median.

Median absolute deviation is less affected by outliers than standard deviation, which relies on squared error terms and is useful for describing data that is not normally distributed.

Median absolute deviation is computed as follows:

median_absolute_deviation = median( | x<sub>i</sub> - median(x<sub>i</sub>) | )

OpenSearch estimates median_absolute_deviation, rather than calculating it directly, because of memory limitations. This estimation is computationally expensive. You can adjust the trade-off between estimation accuracy and performance. For more information, see Adjusting estimation accuracy.

Parameters

The median_absolute_deviation aggregation takes the following parameters.

Parameter Required/Optional Data type Description
field Required String The name of the numeric field for which the median absolute deviation is computed.
missing Optional Numeric The value to assign to missing instances of the field. If not provided, documents with missing values are omitted from the estimation.
compression Optional Numeric A parameter that adjusts the balance between estimate accuracy and performance. The value of compression must be greater than 0. The default value is 1000.

Example

The following example calculates the median absolute deviation of the DistanceMiles field in the opensearch_dashboards_sample_data_flights dataset:

GET opensearch_dashboards_sample_data_flights/_search
{
  "size": 0,
  "aggs": {
    "median_absolute_deviation_DistanceMiles": {
      "median_absolute_deviation": {
        "field": "DistanceMiles"
      }
    }
  }
}

Example response

As shown in the following example response, the aggregation returns an estimate of the median absolute deviation in the median_absolute_deviation_DistanceMiles variable:

{
  "took": 490,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10000,
      "relation": "gte"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "median_absolute_deviation_DistanceMiles": {
      "value": 1830.917892238693
    }
  }
}

Missing values

OpenSearch ignores missing and null values when computing median_absolute_deviation.

You can assign a value to missing instances of the aggregated field. See Missing aggregations for more information.

Adjusting estimation accuracy

The median absolute deviation is calculated using the t-digest data structure, which takes a compression parameter to balance performance and estimation accuracy. Lower values of compression improve performance but may reduce estimation accuracy, as shown in the following request:

GET opensearch_dashboards_sample_data_flights/_search
{
  "size": 0,
  "aggs": {
    "median_absolute_deviation_DistanceMiles": {
      "median_absolute_deviation": {
        "field": "DistanceMiles",
        "compression": 10
      }
    }
  }
}

The estimation error depends on the dataset but is usually below 5%, even for compression values as low as 100. (The low example value of 10 is used here to illustrate the trade-off effect and is not recommended.)

Note the decreased computation time (took time) and the slightly less accurate value of the estimated parameter in the following response.

For reference, OpenSearch’s best estimate (with compression set arbitrarily high) for the median absolute deviation of DistanceMiles is 1831.076904296875:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10000,
      "relation": "gte"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "median_absolute_deviation_DistanceMiles": {
      "value": 1836.265614211182
    }
  }
}
350 characters left

Have a question? .

Want to contribute? or .