Link Search Menu Expand Document Documentation Menu

Workload parameters

Workload parameters let you customize workload behavior without editing the workload files directly. You can control settings such as bulk size, number of shards, index name, and search configuration by passing parameters at runtime.

OpenSearch Benchmark workloads use Jinja2 templates. When you pass parameters using the --workload-params flag, OpenSearch Benchmark injects them into the workload JSON files before execution.

For example, a workload’s index.json might contain the following settings:

{
  "settings": {
    "index.number_of_shards": {{ number_of_shards | default(1) }},
    "index.number_of_replicas": {{ number_of_replicas | default(0) }}
  }
}

When you run a benchmark with --workload-params='{"number_of_shards": 3}', OpenSearch Benchmark replaces {{ number_of_shards | default(1) }} with 3. Parameters you don’t override use their default values.

Passing parameters

You can pass parameters in the following ways.

Create a JSON file containing your parameters:

{
  "number_of_shards": 3,
  "number_of_replicas": 1,
  "bulk_size": 5000,
  "target_index_name": "my_index"
}

Then reference it as follows:

opensearch-benchmark run --workload=geonames --workload-params=my-params.json

Inline JSON

Pass parameters directly on the command line:

opensearch-benchmark run --workload=geonames --workload-params='{"number_of_shards": 3, "bulk_size": 5000}'

Comma-separated key-value pairs

Use the following format to pass key-value pairs:

opensearch-benchmark run --workload=geonames --workload-params="number_of_shards:3,bulk_size:5000"

The comma-separated format only supports string values. Use JSON file or inline JSON for numbers, Boolean values, or nested objects.

Parameter precedence

When the same parameter is defined in multiple sources, OpenSearch Benchmark applies them in the following order (highest priority first):

  1. --workload-params (CLI flag): Overrides all other values.
  2. Workload template defaults: Default values specified in {{ var | default(value) }} expressions in the workload JSON files (for example, {{ number_of_shards | default(1) }}).
  3. Undefined: If no default is specified and the parameter is not provided, OpenSearch Benchmark raises a template rendering error.

Template syntax

This section describes the most common template patterns used in workload files.

Variable with a default value

Use the default filter to specify a fallback value when a parameter is not provided. For example, if bulk_size is not provided in --workload-params, the expression evaluates to 5000:

{{ bulk_size | default(5000) }}

Boolean values

Use the tojson filter for Boolean values to ensure correct JSON output. For example, the following expression evaluates to false (without quotation marks), not "false":

{{ query_cache_enabled | default(false) | tojson }}

String values

Wrap string variables in quotation marks:

"{{ conflicts | default('random') }}"

Conditional sections

Use {% if %} blocks to include or exclude sections based on whether a parameter is defined or on the parameter value.

Including a field only when defined

The following template conditionally includes the target-throughput field. If target_throughput is not provided using --workload-params, the entire field is omitted from the rendered output:

{% if target_throughput is defined %}
"target-throughput": {{ target_throughput }},
{% endif %}

If/else for alternative values

Use {% else %} to provide a fallback. For example, if use_zstd is set to true in --workload-params, the rendered output sets the source-file parameter to documents.json.zst. Otherwise, it sets the source-file to documents.json.bz2:

{% if use_zstd %}
"source-file": "documents.json.zst",
{% else %}
"source-file": "documents.json.bz2",
{% endif %}

Conditionally adding index fields

This pattern is commonly used to define optional fields in vectorsearch workload templates. The {%- endif %} (with the dash) trims trailing whitespace and newline characters, preventing empty lines from appearing and avoiding invalid JSON formatting (such as trailing commas or misaligned structure):

"properties": {
  {% if id_field_name is defined and id_field_name != "_id" %}
  "{{ id_field_name }}": {
    "type": "keyword"
  },
  {%- endif %}
  "embedding": {
    "type": "knn_vector",
    "dimension": {{ target_index_dimension }}
  }
}

Version-based conditionals

Some workloads adapt their behavior based on distribution_version, which OpenSearch Benchmark sets automatically according to the target cluster. This pattern allows a single workload to support multiple OpenSearch versions by conditionally including version-specific operations or settings:

{% if distribution_version is not defined %}
  {% set distribution_version = "2.11.0" %}
{% endif %}

{% if distribution_version.split('.') | map('int') | list >= "2.19.1".split('.') | map('int') | list %}
  {# Include features available in 2.19.1+ #}
{% endif %}

For loops

Use {% for %} loops to generate repeated structures:

{% for i in range(1, 101) %}
{
  "name": "query-{{ i }}",
  "operation-type": "search",
  "body": { ... }
},
{% endfor %}

Integer conversion

Use the int filter when a parameter must be an integer:

{{ target_index_dimension | default(768) | int }}

Including external files

Workloads are typically organized into multiple files for readability. The {{ benchmark.collect }} helper composes a single workload definition from multiple JSON files at render time.

Importing the helper

Every workload.json that uses benchmark.collect must import it at the top of the file:

{% import "benchmark.helpers" as benchmark with context %}

The with context clause ensures that all workload parameters are available in the included files.

Collecting operations and test procedures

A typical workload.json delegates its operations and test procedures to separate files:

{% import "benchmark.helpers" as benchmark with context %}
{
  "version": 2,
  "description": "My workload",
  "indices": [ ... ],
  "corpora": [ ... ],
  "operations": [
    {{ benchmark.collect(parts="operations/*.json") }}
  ],
  "test_procedures": [
    {{ benchmark.collect(parts="test_procedures/*.json") }}
  ]
}

The parts argument accepts glob patterns. The pattern operations/*.json matches all JSON files in the operations/ directory and includes their contents, separated by commas. This keeps the main workload.json concise while the operation and test procedure definitions are defined in separate files.

Composing schedules from shared parts

Test procedures can reuse common schedule fragments. For example, the vectorsearch workload has shared schedules under test_procedures/common/:

test_procedures/
  common/
    index-only-schedule.json
    search-only-schedule.json
    force-merge-schedule.json
    vespa-search-only-schedule.json
  default.json

A test procedure in default.json composes its schedule from these parts:

{
  "name": "no-train-test",
  "default": true,
  "schedule": [
    {{ benchmark.collect(parts="common/index-only-schedule.json") }},
    {{ benchmark.collect(parts="common/force-merge-schedule.json") }},
    {{ benchmark.collect(parts="common/search-only-schedule.json") }}
  ]
}

Each collected file contains one or more schedule entries. Parameters such as {{ target_index_name }} in those files are resolved from the same --workload-params passed on the command line because the with context import propagates all parameters to the included files.

Index body files

The body field in an index definition references a separate JSON file for mappings and settings:

"indices": [
  {
    "name": "geonames",
    "body": "index.json"
  }
]

The index.json file is a Jinja2 template like any other workload file, so it can use parameters:

{
  "settings": {
    "index.number_of_shards": {{ number_of_shards | default(1) }}
  },
  "mappings": { ... }
}

Discovering available parameters

To view the parameters supported by a workload, use the info command. This command lists the workload’s test procedures along with their configurable parameters and default values:

opensearch-benchmark info --workload=geonames

You can also inspect the workload source directly. Parameters appear as {{ variable_name | default(value) }} in workload JSON files. The main workload files are the following:

  • workload.json – The top-level workload definition.
  • index.json – The index settings and mappings.
  • test_procedures/default.json – The test procedure schedules.
  • _operations/default.json – The operation definitions.

Common parameters

The following parameters are supported by most official workloads.

Parameter Description Default
number_of_shards The primary shard count for created indexes. 1
number_of_replicas The replica count for created indexes. 0
bulk_size The number of documents per bulk request. 5000 or 10000
bulk_indexing_clients The number of concurrent bulk indexing clients. 8
ingest_percentage The percentage of the document corpus to ingest. 100
target_throughput The target number of operations per second per client. Unthrottled
search_clients The number of concurrent search clients. 1
cluster_health The required cluster health status before proceeding. green
source_enabled Whether to store the _source field. true

Vector search workload parameters

The vectorsearch workload supports additional parameters for vector search benchmarking.

Parameter Description Default
target_index_name The vector index name. target_index
target_field_name The vector field name. target_field
target_index_dimension The number of vector dimensions. 768
target_index_space_type The distance metric. Valid values are l2, innerproduct, and cosinesimil. Varies
target_index_body The path to index settings file. indices/faiss-index.json
target_index_bulk_size The number of documents per bulk request. 500
target_index_bulk_index_data_set_format The corpus format. Valid values are hdf5 and bigann. hdf5
target_index_bulk_index_data_set_corpus The corpus name (for example, cohere-1m). Varies
target_index_bulk_indexing_clients The number of concurrent indexing clients. 10
target_index_max_num_segments The number of segments after force merge. 1
hnsw_ef_construction The HNSW graph build-time exploration factor. 256
hnsw_ef_search The HNSW search-time exploration factor. 256
query_k The number of nearest neighbors to retrieve. 100
query_count The number of queries to run. Use -1 for all queries. -1
query_data_set_format The query vector format. Valid values are hdf5 and bigann. hdf5
query_data_set_corpus The query vector corpus name. Varies
search_clients The number of concurrent search clients. 1
neighbors_data_set_corpus The ground-truth neighbors corpus used for recall evaluation. Varies
neighbors_data_set_format The neighbors dataset format. hdf5

Example vector search parameter file

The following example shows a complete parameter file for a vectorsearch workload:

{
  "target_index_name": "vector_1m",
  "target_field_name": "embedding",
  "target_index_body": "indices/faiss-index.json",
  "target_index_primary_shards": 1,
  "target_index_replica_shards": 0,
  "target_index_dimension": 768,
  "target_index_space_type": "innerproduct",
  "target_index_bulk_size": 500,
  "target_index_bulk_index_data_set_format": "hdf5",
  "target_index_bulk_index_data_set_corpus": "cohere-1m",
  "target_index_bulk_indexing_clients": 10,
  "target_index_max_num_segments": 1,
  "hnsw_ef_construction": 200,
  "hnsw_ef_search": 256,
  "query_k": 100,
  "query_data_set_format": "hdf5",
  "query_data_set_corpus": "cohere-1m",
  "query_count": 10000,
  "search_clients": 1,
  "neighbors_data_set_corpus": "cohere-1m",
  "neighbors_data_set_format": "hdf5"
}

To use this parameter file, save it as params.json and run the benchmark with the --workload-params flag:

opensearch-benchmark run \
  --pipeline=benchmark-only \
  --workload-path=/path/to/vectorsearch \
  --workload-params=params.json \
  --target-hosts=localhost:9200