Link Search Menu Expand Document Documentation Menu

Vector search settings

OpenSearch supports the following vector search settings. To learn more about static and dynamic settings, see Configuring OpenSearch.

Cluster settings

The following table lists all available cluster-level vector search settings. For more information about cluster settings, see Configuring OpenSearch and Updating cluster settings using the API.

Setting Static/Dynamic Default Description
knn.algo_param.index_thread_qty Dynamic 1 The number of threads used for native library and Lucene library (for OpenSearch version 2.19 and later) index creation. Keeping this value low reduces the CPU impact of the k-NN plugin but also reduces indexing performance.
knn.cache.item.expiry.enabled Dynamic false Whether to remove native library indexes from memory that have not been accessed in a specified period of time.
knn.cache.item.expiry.minutes Dynamic 3h If enabled, the amount of idle time before a native library index is removed from memory.
knn.circuit_breaker.unset.percentage Dynamic 75 The native memory usage threshold for the circuit breaker. Memory usage must be lower than this percentage of knn.memory.circuit_breaker.limit in order for knn.circuit_breaker.triggered to remain false.
knn.circuit_breaker.triggered Dynamic false true when memory usage exceeds the knn.circuit_breaker.unset.percentage value.
knn.memory.circuit_breaker.limit Dynamic 50% The native memory limit for native library indexes. At the default value, if a machine has 100 GB of memory and the JVM uses 32 GB, then the k-NN plugin uses 50% of the remaining 68 GB (34 GB). If memory usage exceeds this value, then the plugin removes the native library indexes used least recently.

To configure this limit at the node level, add node.attr.knn_cb_tier: "<tier-name>" in opensearch.yml and set knn.memory.circuit_breaker.limit.<tier-name> in the cluster settings. For example, define a node tier as node.attr.knn_cb_tier: "integ" and set knn.memory.circuit_breaker.limit.integ: "80%". Nodes use their tier’s circuit breaker limit if configured, defaulting to the cluster-wide setting if no node-specific value is set.
knn.memory.circuit_breaker.enabled Dynamic true Whether to enable the k-NN memory circuit breaker.
knn.model.index.number_of_shards Dynamic 1 The number of shards to use for the model system index, which is the OpenSearch index that stores the models used for approximate nearest neighbor (ANN) search.
knn.model.index.number_of_replicas Dynamic 1 The number of replica shards to use for the model system index. Generally, in a multi-node cluster, this value should be at least 1 in order to increase stability.
knn.model.cache.size.limit Dynamic 10% The model cache limit cannot exceed 25% of the JVM heap.
knn.faiss.avx2.disabled Static false A static setting that specifies whether to disable the SIMD-based libopensearchknn_faiss_avx2.so library and load the non-optimized libopensearchknn_faiss.so library for the Faiss engine on machines with x64 architecture. For more information, see Single Instruction Multiple Data (SIMD) optimization.
knn.faiss.avx512_spr.disabled Static false A static setting that specifies whether to disable the SIMD-based libopensearchknn_faiss_avx512_spr.so library and load either the libopensearchknn_faiss_avx512.so , libopensearchknn_faiss_avx2.so, or the non-optimized libopensearchknn_faiss.so library for the Faiss engine on machines with x64 architecture. For more information, see SIMD optimization for the Faiss engine.

Index settings

The following table lists all available index-level k-NN settings. For information about updating these settings, see Index-level index settings.

Several parameters defined in the settings are currently in the deprecation process. Those parameters should be set in the mapping instead of in the index settings. Parameters set in the mapping will override the parameters set in the index settings. Setting the parameters in the mapping allows an index to have multiple knn_vector fields with different parameters.

Setting Static/Dynamic Default Description
index.knn Static false Whether the index should build native library indexes for the knn_vector fields. If set to false, the knn_vector fields will be stored in doc values, but approximate k-NN search functionality will be disabled.
index.knn.algo_param.ef_search Dynamic 100 ef (or efSearch) represents the size of the dynamic list for the nearest neighbors used during a search. Higher ef values lead to a more accurate but slower search. ef cannot be set to a value lower than the number of queried nearest neighbors, k. ef can take any value between k and the size of the dataset.
index.knn.advanced.approximate_threshold Dynamic 15000 The number of vectors that a segment must have before creating specialized data structures for ANN search. Set to -1 to disable building vector data structures and to 0 to always build them.
index.knn.advanced.filtered_exact_search_threshold Dynamic None The filtered ID threshold value used to switch to exact search during filtered ANN search. If the number of filtered IDs in a segment is lower than this setting’s value, then exact search will be performed on the filtered IDs.
index.knn.derived_source.enabled Static true Prevents vectors from being stored in _source, reducing disk usage for vector indexes.

An index created in OpenSearch version 2.11 or earlier will still use the previous ef_construction and ef_search values (512).

Remote index build settings

Introduced 3.0

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated GitHub issue.

The following settings control remote vector index building.

The poll_interval, timeout, and size_threshold are advanced settings. Their default values are set as a result of extensive benchmarking.

Cluster settings

The following remote index build settings apply at the cluster level.

Setting Static/Dynamic Default Description
knn.feature.remote_index_build.enabled Dynamic false Enables remote vector index building for the cluster.
knn.remote_index_build.vector_repo Dynamic None The repository to which the remote index builder should write.
knn.remote_index_build.client.endpoint Dynamic None The endpoint URL of the remote build service.
knn.remote_index_build.client.poll_interval Dynamic 5s How frequently the client should poll the remote build service for job status.
knn.remote_index_build.client.timeout Dynamic 60m The maximum amount of time to wait for remote build completion before falling back to a CPU-based build.

Index settings

The following remote index build settings apply at the index level.

Setting Static/Dynamic Default Description
index.knn.remote_index_build.enabled Dynamic false Enables remote index building for the index. Currently, the remote index build service supports Faiss indexes with the hnsw method and the default 32-bit floating-point (FP32) vectors.
index.knn.remote_index_build.size_threshold Dynamic 50mb The minimum size required to enable remote vector builds.

Remote build authentication

The remote build service username and password are secure settings that must be set in the OpenSearch keystore as follows:

./bin/opensearch-keystore add knn.remote_index_build.client.username
./bin/opensearch-keystore add knn.remote_index_build.client.password

You can reload the secure settings without restarting the node by using the Nodes Reload Secure API.

350 characters left

Have a question? .

Want to contribute? or .