Exact search using scalar quantization

Introduced 3.6

OpenSearch supports the flat quantization method, which performs scalar quantization on 32-bit floating-point vectors. Unlike HNSW scalar quantization for the Faiss and Lucene engines, which builds a navigable graph for approximate nearest neighbor search, the flat method performs exact (brute-force) k-NN search on quantized vectors. This provides perfect recall at the cost of higher search latency for large datasets.

The flat method quantizes vectors to 1 bit per dimension and does not support any encoder or method parameters.

The flat method is best suited for smaller datasets or use cases with restrictive filters where exact search results are required. For larger datasets where approximate results are acceptable, consider using HNSW scalar quantization for Faiss or Lucene engines.

Running an exact search using scalar quantization

To perform an exact search using scalar quantization, set the k-NN vector field’s method.name to flat when creating a vector index:

PUT /test-index
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 4,
        "space_type": "l2",
        "method": {
          "name": "flat"
        }
      }
    }
  }
}

Scalar quantization is applied only to float vectors. If you change the default value of the data_type parameter from float to byte or any other type when mapping a k-NN vector, then the request is rejected.

Search

Because the flat method uses 1-bit quantized vectors, rescoring is enabled by default to preserve search recall. The search runs in two phases: the quantized index is searched first, and then the results are rescored using full-precision vectors. The default oversample_factor is 2.0.

To search a flat-quantized index, send the following request:

GET /test-index/_search
{
  "query": {
    "knn": {
      "my_vector1": {
        "vector": [1.5, 2.5, 3.5, 4.5],
        "k": 5
      }
    }
  }
}

To customize the oversample_factor, provide the rescore parameter in the query. The oversample_factor is a floating-point number between 1.0 and 100.0, inclusive. A higher value retrieves more candidates in the first phase, which can improve recall at the cost of higher search latency:

GET /test-index/_search
{
  "query": {
    "knn": {
      "my_vector1": {
        "vector": [1.5, 2.5, 3.5, 4.5],
        "k": 5,
        "rescore": {
          "oversample_factor": 5.0
        }
      }
    }
  }
}

For more information about rescoring, see Rescoring quantized results to full precision.

Next steps

Running an exact search using scalar quantization
Search
Next steps

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

Exact search using scalar quantization

Running an exact search using scalar quantization

Search

Next steps

OpenSearch Links

Get Involved

Resources

Contact Us