Link Search Menu Expand Document Documentation Menu

Lucene scalar quantization

OpenSearch supports built-in scalar quantization for the Lucene engine. Unlike byte vectors, which require you to quantize vectors before ingesting documents, the Lucene scalar quantizer quantizes input vectors in OpenSearch during ingestion. The quantizer converts 32-bit floating-point input vectors into lower-bit representations in each segment. OpenSearch supports 7-bit quantization and 1-bit quantization. When searching, the query vector is quantized in each segment in order to compute the distance between the query vector and the segment’s quantized input vectors. Quantization can decrease the memory footprint in exchange for some loss in recall. Additionally, quantization slightly increases disk usage because it requires storing both the raw input vectors and the quantized vectors.

The bits parameter is required when configuring the sq encoder.

Using Lucene scalar quantization

To use the Lucene scalar quantizer, set the k-NN vector field’s method.parameters.encoder.name to sq when creating a vector index. You must specify the bits parameter in the method.parameters.encoder.parameters object:

PUT /test-index
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 2,
        "space_type": "l2",
        "method": {
          "name": "hnsw",
          "engine": "lucene",
          "parameters": {
            "encoder": {
              "name": "sq",
              "parameters": {
                "bits": 1
              }
            },
            "ef_construction": 256,
            "m": 8
          }
        }
      }
    }
  }
}

Lucene scalar quantization is applied only to float vectors. If you change the default value of the data_type parameter from float to byte or any other type when mapping a k-NN vector, then the request is rejected.

SQ parameters

The Lucene sq encoder supports the following parameters.

Parameter name Required Default Description
bits Yes 1 The number of bits used to quantize each vector dimension. Valid values are 1 and 7.
confidence_interval No Computed based on vector dimension The quantile interval used to compute the minimum and maximum values for quantization. Supported for 7-bit quantization only. For more information, see Confidence interval.

The confidence_interval parameter is only supported for 7-bit quantization. If you set bits to any other value and specify a confidence_interval, the request is rejected.

1-bit quantization

Introduced 3.6

You can use 1-bit scalar quantization to further reduce the memory footprint. With 1-bit quantization, each vector dimension is represented using a single bit, resulting in a significantly smaller index size compared to 7-bit quantization.

The 1-bit quantizer does not support the confidence_interval parameter. Do not specify confidence_interval when using 1-bit quantization.

The following example creates an index with 1-bit Lucene scalar quantization:

PUT /test-index
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 2,
        "space_type": "l2",
        "method": {
          "name": "hnsw",
          "engine": "lucene",
          "parameters": {
            "encoder": {
              "name": "sq",
              "parameters": {
                "bits": 1
              }
            },
            "ef_construction": 256,
            "m": 8
          }
        }
      }
    }
  }
}

7-bit quantization

With 7-bit quantization, the Lucene scalar quantizer converts each 32-bit floating-point vector dimension into a 7-bit integer value using the minimum and maximum quantiles computed based on the confidence_interval parameter. When searching, the query vector is quantized in each segment using the segment’s minimum and maximum quantiles.

Confidence interval

Optionally, you can specify the confidence_interval parameter in the method.parameters.encoder object. The confidence_interval is used to compute the minimum and maximum quantiles in order to quantize the vectors:

  • If you set the confidence_interval to a value in the 0.9 to 1.0 range, inclusive, then the quantiles are calculated statically. For example, setting the confidence_interval to 0.9 specifies that OpenSearch will compute the minimum and maximum quantiles based on the middle 90% of the vector values, excluding the minimum 5% and maximum 5% of the values.
  • Setting confidence_interval to 0 specifies that OpenSearch will compute the quantiles dynamically, which involves oversampling and additional computations performed on the input data.
  • When confidence_interval is not set, it is computed based on the vector dimension $d$ using the formula $max(0.9, 1 - \frac{1}{1 + d})$.

The following example method definition specifies the Lucene sq encoder with 7-bit quantization and the confidence_interval set to 1.0. This confidence_interval specifies to use all the input vectors when computing the minimum and maximum quantiles:

PUT /test-index
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 2,
        "space_type": "l2",
        "method": {
          "name": "hnsw",
          "engine": "lucene",
          "parameters": {
            "encoder": {
              "name": "sq",
              "parameters": {
                "bits": 7,
                "confidence_interval": 1.0
              }
            },
            "ef_construction": 256,
            "m": 8
          }
        }
      }
    }
  }
}

Memory estimation

In the ideal scenario, 7-bit vectors created by the Lucene scalar quantizer use only 25% of the memory required by 32-bit vectors. For 1-bit vectors, the memory footprint is approximately 3.125% of the original 32-bit vectors (a reduction factor of 32).

HNSW memory estimation

The memory required for the Hierarchical Navigable Small World (HNSW) graph can be estimated as 1.1 * (dimension * bits_per_dimension / 8 + 8 * m) bytes per vector, where m is the maximum number of bidirectional links created for each element during the construction of the graph.

As an example, assume that you have 1 million vectors with a dimension of 256 and m of 16.

For 7-bit quantization, the memory requirement can be estimated as follows:

1.1 * (256 * 7 / 8 + 8 * 16) * 1,000,000 ~= 0.387 GB

For 1-bit quantization, the memory requirement can be estimated as follows:

1.1 * (256 / 8 + 8 * 16) * 1,000,000 ~= 0.176 GB

Next steps

350 characters left

Have a question? .

Want to contribute? or .