Using custom configurations for neural sparse search
Neural sparse search using automatically generated vector embeddings operates in two modes: doc-only and bi-encoder. For more information, see Generating sparse vector embeddings automatically.
At query time, you can use custom models in the following ways:
-
Bi-encoder mode: Use your deployed sparse encoding model to generate embeddings from query text. This must be the same model you used at ingestion time.
-
Doc-only mode with a custom tokenizer: Use your deployed tokenizer model to tokenize query text. The token weights are obtained from a precomputed lookup table.
The following is a complete example of using a custom model for neural sparse search.
Step 1: Configure a sparse encoding model/tokenizer
You must configure a sparse encoding model for ingestion when using both the bi-encoder mode and the doc-only mode with a custom tokenizer. Bi-encoder mode uses the same model for search; doc-only mode uses a separate tokenizer for search.
Step 1(a): Choose the search mode
Choose the search mode and the appropriate model/tokenizer combination:
-
Bi-encoder: Use the
amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distillmodel during both ingestion and search. -
Doc-only with a custom tokenizer: Use the
amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v3-distillmodel during ingestion and theamazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1tokenizer during search.
The following tables provide a search relevance comparison for all available combinations of the two search modes so that you can choose the best combination for your use case.
English language models
| Mode | Ingestion model | Search model | Avg. search relevance on BEIR | Model parameters |
|---|---|---|---|---|
| Doc-only | amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1 | amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1 | 0.49 | 133M |
| Doc-only | amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-distill | amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1 | 0.504 | 67M |
| Doc-only | amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v2-mini | amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1 | 0.497 | 23M |
| Doc-only | amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v3-distill | amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1 | 0.517 | 67M |
| Doc-only | amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v3-gte | amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1 | 0.546 | 133M |
| Bi-encoder | amazon/neural-sparse/opensearch-neural-sparse-encoding-v1 | amazon/neural-sparse/opensearch-neural-sparse-encoding-v1 | 0.524 | 133M |
| Bi-encoder | amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill | amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill | 0.528 | 67M |
Multilingual models
| Mode | Ingestion model | Search model | Avg. search relevance on MIRACL | Model parameters |
|---|---|---|---|---|
| Doc-only | amazon/neural-sparse/opensearch-neural-sparse-encoding-multilingual-v1 | amazon/neural-sparse/opensearch-neural-sparse-tokenizer-multilingual-v1 | 0.629 | 168M |
Step 1(b): Register the model/tokenizer
For both modes, register the sparse encoding model. For the doc-only mode with a custom tokenizer, register a custom tokenizer in addition to the sparse encoding model.
Bi-encoder mode
When using bi-encoder mode, you only need to register the amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill model.
Register the sparse encoding model:
POST /_plugins/_ml/models/_register?deploy=true
{
"name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-v2-distill",
"version": "1.0.0",
"model_format": "TORCH_SCRIPT"
}
Registering a model is an asynchronous task. OpenSearch returns a task ID for every model you register:
{
"task_id": "aFeif4oB5Vm0Tdw8yoN7",
"status": "CREATED"
}
You can check the status of the task by calling the Tasks API:
GET /_plugins/_ml/tasks/aFeif4oB5Vm0Tdw8yoN7
Once the task is complete, the task state will change to COMPLETED and the Tasks API response will contain the model ID of the registered model:
{
"model_id": "<bi-encoder model ID>",
"task_type": "REGISTER_MODEL",
"function_name": "SPARSE_ENCODING",
"state": "COMPLETED",
"worker_node": [
"4p6FVOmJRtu3wehDD74hzQ"
],
"create_time": 1694358489722,
"last_update_time": 1694358499139,
"is_async": true
}
Note the model_id of the model you’ve created; you’ll need it for the following steps.
Doc-only mode with a custom tokenizer
When using the doc-only mode with a custom tokenizer, you need to register the amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v3-distill model, which you’ll use at ingestion time, and the amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1 tokenizer, which you’ll use at search time.
Register the sparse encoding model:
POST /_plugins/_ml/models/_register?deploy=true
{
"name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v3-distill",
"version": "1.0.0",
"model_format": "TORCH_SCRIPT"
}
Register the tokenizer:
POST /_plugins/_ml/models/_register?deploy=true
{
"name": "amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1",
"version": "1.0.1",
"model_format": "TORCH_SCRIPT"
}
Like in bi-encoder mode, use the Tasks API to check the status of the registration task. After the Tasks API returns, the task state changes to COMPLETED. Note the model_id of the model and the tokenizer you’ve created; you’ll need them for the following steps.
Step 2: Ingest data
In both the bi-encoder and doc-only modes, you’ll use a sparse encoding model at ingestion time to generate sparse vector embeddings.
Step 2(a): Create an ingest pipeline
To generate sparse vector embeddings, you need to create an ingest pipeline that contains a sparse_encoding processor, which will convert the text in a document field to vector embeddings. The processor’s field_map determines the input fields from which to generate vector embeddings and the output fields in which to store the embeddings.
The following example request creates an ingest pipeline where the text from passage_text will be converted into sparse vector embeddings, which will be stored in passage_embedding. Provide the model ID of the registered model in the request:
PUT /_ingest/pipeline/nlp-ingest-pipeline-sparse
{
"description": "An sparse encoding ingest pipeline",
"processors": [
{
"sparse_encoding": {
"model_id": "<bi-encoder or doc-only model ID>",
"prune_type": "max_ratio",
"prune_ratio": 0.1,
"field_map": {
"passage_text": "passage_embedding"
}
}
}
]
}
To split long text into passages, use the text_chunking ingest processor before the sparse_encoding processor. For more information, see Text chunking.
Step 2(b): Create an index for ingestion
In order to use the sparse encoding processor defined in your pipeline, create a rank features index, adding the pipeline created in the previous step as the default pipeline. Ensure that the fields defined in the field_map are mapped as correct types. Continuing with the example, the passage_embedding field must be mapped as rank_features. Similarly, the passage_text field must be mapped as text.
The following example request creates a rank features index configured with a default ingest pipeline:
PUT /my-nlp-index
{
"settings": {
"default_pipeline": "nlp-ingest-pipeline-sparse"
},
"mappings": {
"properties": {
"id": {
"type": "text"
},
"passage_embedding": {
"type": "rank_features"
},
"passage_text": {
"type": "text"
}
}
}
}
To save disk space, you can exclude the embedding vector from the source as follows:
PUT /my-nlp-index
{
"settings": {
"default_pipeline": "nlp-ingest-pipeline-sparse"
},
"mappings": {
"_source": {
"excludes": [
"passage_embedding"
]
},
"properties": {
"id": {
"type": "text"
},
"passage_embedding": {
"type": "rank_features"
},
"passage_text": {
"type": "text"
}
}
}
}
Once the <token, weight> pairs are excluded from the source, they cannot be recovered. Before applying this optimization, make sure you don’t need the <token, weight> pairs for your application.
Step 2(c): Ingest documents into the index
To ingest documents into the index created in the previous step, send the following requests:
PUT /my-nlp-index/_doc/1
{
"passage_text": "Hello world",
"id": "s1"
}
PUT /my-nlp-index/_doc/2
{
"passage_text": "Hi planet",
"id": "s2"
}
Before the document is ingested into the index, the ingest pipeline runs the sparse_encoding processor on the document, generating vector embeddings for the passage_text field. The indexed document includes the passage_text field, which contains the original text, and the passage_embedding field, which contains the vector embeddings.
Step 3: Search the data
To perform a neural sparse search on your index, use the neural_sparse query clause in Query DSL queries.
The following example request uses a neural_sparse query to search for relevant documents using a raw text query. Provide the model ID for bi-encoder mode or the tokenizer ID for doc-only mode with a custom tokenizer:
GET my-nlp-index/_search
{
"query": {
"neural_sparse": {
"passage_embedding": {
"query_text": "Hi world",
"model_id": "<bi-encoder or tokenizer ID>"
}
}
}
}
The response contains the matching documents:
{
"took" : 688,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 30.0029,
"hits" : [
{
"_index" : "my-nlp-index",
"_id" : "1",
"_score" : 30.0029,
"_source" : {
"passage_text" : "Hello world",
"passage_embedding" : {
"!" : 0.8708904,
"door" : 0.8587369,
"hi" : 2.3929274,
"worlds" : 2.7839446,
"yes" : 0.75845814,
"##world" : 2.5432441,
"born" : 0.2682308,
"nothing" : 0.8625516,
"goodbye" : 0.17146169,
"greeting" : 0.96817183,
"birth" : 1.2788506,
"come" : 0.1623208,
"global" : 0.4371151,
"it" : 0.42951578,
"life" : 1.5750692,
"thanks" : 0.26481047,
"world" : 4.7300377,
"tiny" : 0.5462298,
"earth" : 2.6555297,
"universe" : 2.0308156,
"worldwide" : 1.3903781,
"hello" : 6.696973,
"so" : 0.20279501,
"?" : 0.67785245
},
"id" : "s1"
}
},
{
"_index" : "my-nlp-index",
"_id" : "2",
"_score" : 16.480486,
"_source" : {
"passage_text" : "Hi planet",
"passage_embedding" : {
"hi" : 4.338913,
"planets" : 2.7755864,
"planet" : 5.0969057,
"mars" : 1.7405145,
"earth" : 2.6087382,
"hello" : 3.3210192
},
"id" : "s2"
}
}
]
}
}
Configuring a default model for search
When using custom models, you can configure a default model ID at the index level to simplify your queries. This eliminates the need to specify the model_id in every query.
First, create a search pipeline with a neural_query_enricher processor:
PUT /_search/pipeline/neural_search_pipeline
{
"request_processors": [
{
"neural_query_enricher" : {
"default_model_id": "<bi-encoder model/tokenizer ID>"
}
}
]
}
Then set this pipeline as the default for your index:
PUT /my-nlp-index/_settings
{
"index.search.default_pipeline" : "neural_search_pipeline"
}
After configuring the default model, you can omit the model_id when running queries.
For more information about setting a default model on an index, or to learn how to set a default model on a specific field, see Setting a default model on an index or field.
Next steps
- Explore our tutorials to learn how to build AI search applications.