DL model analyzers

Deep learning (DL) model analyzers are designed to work with neural sparse search. They implement the same tokenization rules used by machine learning (ML) models, ensuring compatibility with neural sparse search. While traditional OpenSearch analyzers use standard rule-based tokenization (like white space or word boundaries), DL model analyzers use tokenization rules that match specific ML models (like BERT’s WordPiece tokenization scheme). This consistent tokenization between indexed documents and search queries is essential for neural sparse search to work correctly.

OpenSearch supports the following DL model analyzers:

bert-uncased: An analyzer based on the google-bert/bert-base-uncased model tokenizer.
mbert-uncased: A multilingual analyzer based on the google-bert/bert-base-multilingual-uncased model tokenizer.

Usage considerations

When using the DL model analyzers, keep the following considerations in mind:

These analyzers use lazy loading. The first call to these analyzers may take longer because dependencies and related resources are loaded.
The tokenizers follow the same rules as their corresponding model tokenizers.

The bert-uncased analyzer

The bert-uncased analyzer is based on the google-bert/bert-base-uncased model and tokenizes text according to BERT’s WordPiece tokenization scheme. This analyzer is particularly useful for English language text.

To analyze text with the bert-uncased analyzer, specify it in the analyzer field:

POST /_analyze
{
  "analyzer": "bert-uncased",
  "text": "It's fun to contribute to OpenSearch!"
}

The mbert-uncased analyzer

The mbert-uncased analyzer is based on the google-bert/bert-base-multilingual-uncased model, which supports tokenization across multiple languages. This makes it suitable for applications dealing with multilingual content.

To analyze multilingual text, specify the mbert-uncased analyzer in the request:

POST /_analyze
{
  "analyzer": "mbert-uncased",
  "text": "It's fun to contribute to OpenSearch!"
}

Example

For a complete example of using DL model analyzers in neural sparse search queries, see Generating sparse vector embeddings automatically.

Usage considerations
The bert-uncased analyzer
The mbert-uncased analyzer
Example

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

DL model analyzers

Usage considerations

The bert-uncased analyzer

The mbert-uncased analyzer

Example

OpenSearch Links

Get Involved

Resources

Contact Us