Link Search Menu Expand Document Documentation Menu

You're viewing version 2.19 of the OpenSearch documentation. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.

Spaces

In vector search, a space defines how the distance (or similarity) between two vectors is calculated. The choice of space affects how nearest neighbors are determined during search operations.

Distance calculation

A space defines the function used to measure the distance between two points in order to determine the k-nearest neighbors. In k-NN search, a lower score equates to a closer and better result. This is the opposite of how OpenSearch scores results, where a higher score equates to a better result. OpenSearch supports the following spaces.

Not every method/engine combination supports each of the spaces. For a list of supported spaces, see the section for a specific engine in the method documentation.

Space type Search type Distance function (\(d\) ) OpenSearch score
l1 Approximate, exact \(d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n \lvert x_i - y_i \rvert\) \(score = {1 \over {1 + d} }\)
l2 Approximate, exact \(d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n (x_i - y_i)^2\) \(score = {1 \over 1 + d }\)
linf Approximate, exact \(d(\mathbf{x}, \mathbf{y}) = max(\lvert x_i - y_i \rvert)\) \(score = {1 \over 1 + d }\)
cosinesimil Approximate, exact \(d(\mathbf{x}, \mathbf{y}) = 1 - cos { \theta } = 1 - {\mathbf{x} \cdot \mathbf{y} \over \lVert \mathbf{x}\rVert \cdot \lVert \mathbf{y}\rVert}\)\(= 1 - {\sum_{i=1}^n x_i y_i \over \sqrt{\sum_{i=1}^n x_i^2} \cdot \sqrt{\sum_{i=1}^n y_i^2}}\),
where \(\lVert \mathbf{x}\rVert\) and \(\lVert \mathbf{y}\rVert\) represent the norms of vectors \(\mathbf{x}\) and \(\mathbf{y}\), respectively.
\(score = {2 - d \over 2}\)
innerproduct (supported for Lucene in OpenSearch version 2.13 and later) Approximate NMSLIB and Faiss:
\(d(\mathbf{x}, \mathbf{y}) = - {\mathbf{x} \cdot \mathbf{y}} = - \sum_{i=1}^n x_i y_i\)

Lucene:
\(d(\mathbf{x}, \mathbf{y}) = {\mathbf{x} \cdot \mathbf{y}} = \sum_{i=1}^n x_i y_i\)
NMSLIB and Faiss:
\(\text{If} d \ge 0, score = {1 \over 1 + d }\)
\(\text{If} d < 0, score = −d + 1\)

Lucene:
\(\text{If} d > 0, score = d + 1\)
\(\text{If} d \le 0, score = {1 \over 1 + (-1 \cdot d) }\)
innerproduct (supported for Lucene in OpenSearch version 2.13 and later) Exact \(d(\mathbf{x}, \mathbf{y}) = - {\mathbf{x} \cdot \mathbf{y}} = - \sum_{i=1}^n x_i y_i\) \(\text{If} d \ge 0, score = {1 \over 1 + d }\)
\(\text{If} d < 0, score = −d + 1\)
hamming (supported for binary vectors in OpenSearch version 2.16 and later) Approximate, exact \(d(\mathbf{x}, \mathbf{y}) = \text{countSetBits}(\mathbf{x} \oplus \mathbf{y})\) \(score = {1 \over 1 + d }\)
hammingbit (supported for binary and long vectors) Exact \(d(\mathbf{x}, \mathbf{y}) = \text{countSetBits}(\mathbf{x} \oplus \mathbf{y})\) \(score = {1 \over 1 + d }\)

The cosine similarity formula does not include the 1 - prefix. However, because similarity search libraries equate lower scores with closer results, they return 1 - cosineSimilarity for the cosine similarity space—this is why 1 - is included in the distance function.

With cosine similarity, it is not valid to pass a zero vector ([0, 0, ...]) as input. This is because the magnitude of such a vector is 0, which raises a divide by 0 exception in the corresponding formula. Requests containing the zero vector will be rejected, and a corresponding exception will be thrown.

The hamming space type is supported for binary vectors in OpenSearch version 2.16 and later. For more information, see Binary k-NN vectors.

Specifying the space type

The space type is specified when creating an index.

You can specify the space type at the top level of the field mapping:

PUT /test-index
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 3,
        "space_type": "l2"
      }
    }
  }
}

Alternatively, you can specify the space type within the method object if defining a method:

PUT test-index
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 1024,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "nmslib",
          "parameters": {
            "ef_construction": 128,
            "m": 24
          }
        }
      }
    }
  }
}

350 characters left

Have a question? .

Want to contribute? or .