You're viewing version 3.4 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.
kmeans (Deprecated)
The kmeans command is deprecated in favor of the ml command.
The kmeans command applies the k-means algorithm in the ML Commons plugin on the search results returned by a PPL command.
To use the kmeans command, plugins.calcite.enabled must be set to false.
Syntax
The kmeans command has the following syntax:
kmeans <centroids> <iterations> <distance_type>
Parameters
The kmeans command supports the following parameters.
| Parameter | Required/Optional | Description |
|---|---|---|
<centroids> | Optional | The number of clusters to group data points into. Default is 2. |
<iterations> | Optional | The number of iterations. Default is 10. |
<distance_type> | Optional | The distance type. Valid values are COSINE, L1, and EUCLIDEAN. Default is EUCLIDEAN. |
Example: Clustering of the Iris dataset
The following query classifies three Iris species (Iris setosa, Iris virginica, and Iris versicolor) based on the combination of four features measured from each sample (the lengths and widths of sepals and petals):
source=iris_data
| fields sepal_length_in_cm, sepal_width_in_cm, petal_length_in_cm, petal_width_in_cm
| kmeans centroids=3
The query returns the following results:
| sepal_length_in_cm | sepal_width_in_cm | petal_length_in_cm | petal_width_in_cm | ClusterID |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | 1 |
| 5.6 | 3.0 | 4.1 | 1.3 | 0 |
| 6.7 | 2.5 | 5.8 | 1.8 | 2 |