Faiss scalar quantization

OpenSearch supports built-in scalar quantization for the Faiss engine. The Faiss scalar quantizer converts 32-bit floating-point input vectors into lower-bit representations during ingestion and stores the quantized vectors in a vector index. OpenSearch supports two types of scalar quantization for the Faiss engine: 16-bit quantization and 1-bit quantization. Quantization can decrease the memory footprint in exchange for some loss in recall. When used with SIMD optimization, Faiss scalar quantization can also significantly reduce search latencies and improve indexing throughput.

The bits parameter is required when configuring the sq encoder.

SIMD optimization is not supported on Windows. Using Faiss scalar quantization on Windows can lead to a significant drop in performance, including decreased indexing throughput and increased search latencies.

Using Faiss scalar quantization

To use Faiss scalar quantization, set the k-NN vector field’s method.parameters.encoder.name to sq when creating a vector index. You must specify the bits parameter in the method.parameters.encoder.parameters object:

PUT /test-index
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 3,
        "space_type": "l2",
        "method": {
          "name": "hnsw",
          "engine": "faiss",
          "parameters": {
            "encoder": {
              "name": "sq",
              "parameters": {
                "bits": 16
              }
            },
            "ef_construction": 256,
            "m": 8
          }
        }
      }
    }
  }
}

The Faiss sq encoder supports the following parameters.

Parameter name	Required	Default	Description
`bits`	Yes	1	The number of bits used to quantize each vector dimension. Valid values are `1` and `16`.
`type`	No	`fp16`	The type of scalar quantization to be used. For the `fp16` encoder, vector values must be in the [-65504.0, 65504.0] range. Supported for 16-bit quantization only.
`clip`	No	`false`	If `true`, any vector values outside of the supported range are rounded so that they are within the range. If `false`, the request is rejected if any vector values are outside of the supported range. Setting `clip` to `true` may decrease recall. Supported for 16-bit quantization only.

The type and clip parameters are supported only for 16-bit quantization. If you set bits to any other value and specify type or clip, the request is rejected.

1-bit quantization

Introduced 3.6

You can use 1-bit scalar quantization to significantly reduce the memory footprint. 1-bit quantization uses memory-optimized search, and each vector dimension is represented using a single bit, resulting in a much smaller index size compared to 16-bit quantization.

The following example creates an index with 1-bit Faiss scalar quantization:

PUT /test-index
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 3,
        "space_type": "l2",
        "method": {
          "name": "hnsw",
          "engine": "faiss",
          "parameters": {
            "encoder": {
              "name": "sq",
              "parameters": {
                "bits": 1
              }
            },
            "ef_construction": 256,
            "m": 8
          }
        }
      }
    }
  }
}

16-bit quantization

With 16-bit quantization, the Faiss scalar quantizer (SQfp16) converts 32-bit floating-point vectors into 16-bit floating-point vectors. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 quantization can decrease the memory footprint by a factor of 2 with minimal loss in recall when differences between vector values are large compared to the error introduced by eliminating their two least significant bits.

Type and clip parameters

The fp16 encoder converts 32-bit vectors into their 16-bit counterparts. For this encoder type, the vector values must be in the [-65504.0, 65504.0] range. To define how to handle out-of-range values, you can specify the clip parameter. By default, this parameter is false, and any vectors containing out-of-range values are rejected.

When clip is set to true, out-of-range vector values are rounded up or down so that they are in the supported range. For example, if the original 32-bit vector is [65510.82, -65504.1], the vector will be indexed as a 16-bit vector [65504.0, -65504.0].

We recommend setting clip to true only if very few vector dimensions lie outside of the supported range. Rounding the values may cause a drop in recall.

The following example specifies the Faiss SQfp16 encoder with 16-bit quantization, which rejects any indexing request that contains out-of-range vector values (because the clip parameter is false by default):

PUT /test-index
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 3,
        "space_type": "l2",
        "method": {
          "name": "hnsw",
          "engine": "faiss",
          "parameters": {
            "encoder": {
              "name": "sq",
              "parameters": {
                "bits": 16
              }
            },
            "ef_construction": 256,
            "m": 8
          }
        }
      }
    }
  }
}

When indexing vectors, ensure that each vector dimension is in the supported range ([-65504.0, 65504.0]).

PUT test-index/_doc/1
{
  "my_vector1": [-65504.0, 65503.845, 55.82]
}

When querying vectors, the query vector has no range limitation:

GET test-index/_search
{
  "size": 2,
  "query": {
    "knn": {
      "my_vector1": {
        "vector": [265436.876, -120906.256, 99.84],
        "k": 2
      }
    }
  }
}

Memory estimation

In the best-case scenario, 16-bit vectors produced by the Faiss SQfp16 quantizer require 50% of the memory that 32-bit vectors require.

HNSW memory estimation

The memory required for Hierarchical Navigable Small Worlds (HNSW) is estimated to be 1.1 * (dimension * bits_per_dimension / 8 + 8 * m) bytes per vector, where m is the maximum number of bidirectional links created for each element during the construction of the graph.

As an example, assume that you have 1 million vectors with a dimension of 256 and an m of 16.

For 16-bit quantization, the memory requirement can be estimated as follows:

1.1 * (2 * 256 + 8 * 16) * 1,000,000 ~= 0.656 GB

For 1-bit quantization, the memory requirement can be estimated as follows:

1.1 * (256 / 8 + 8 * 16) * 1,000,000 ~= 0.176 GB

IVF memory estimation

The memory required for IVF is estimated to be 1.1 * (((bytes_per_dimension * dimension) * num_vectors) + (4 * nlist * dimension)) bytes, where nlist is the number of buckets to partition vectors into.

As an example, assume that you have 1 million vectors with a dimension of 256 and an nlist of 128.

For 16-bit quantization, the memory requirement can be estimated as follows:

1.1 * (((2 * 256) * 1,000,000) + (4 * 128 * 256))  ~= 0.525 GB

For 1-bit quantization, the memory requirement can be estimated as follows:

1.1 * (((256 / 8) * 1,000,000) + (4 * 128 * 256))  ~= 0.035 GB

Next steps

Using Faiss scalar quantization
1-bit quantization
16-bit quantization
- Type and clip parameters
Memory estimation
- HNSW memory estimation
- IVF memory estimation
Next steps

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

Faiss scalar quantization

Using Faiss scalar quantization

1-bit quantization

16-bit quantization

Type and clip parameters

Memory estimation

HNSW memory estimation

IVF memory estimation

Next steps

OpenSearch Links

Get Involved

Resources

Contact Us