Comparing query sets

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated GitHub issue.

To compare the results of two different search configurations, you can run a pairwise experiment. To achieve this, you need two search configurations and a query set to use for the search configuration.

For more information about creating a query set, see Query Sets.

For more information about creating search configurations, see Search Configurations.

Creating a pairwise experiment

An experiment is used to compare the metrics between two different search configurations. An experiment shows you the top N results for every query based on the specified search configurations. In the dashboard, you can view the returned documents from any of the queries in the query set and determine which search configuration returns more relevant results. Additionally, you can measure the similarity between the two returned search result lists using the provided similarity metrics.

Example

To create a pairwise comparison experiment for the specified query set and search configurations, send the following request:

PUT _plugins/_search_relevance/experiments
{
    "querySetId": "8368a359-146b-4690-b756-40591b2fcddb",
   	"searchConfigurationList": ["a5acc9f3-6ad7-43f4-9651-fe118c499bc6", "26c7255c-c36e-42fb-b5b2-633dbf8e53b6"],
   	"size": 10,
   	"type": "PAIRWISE_COMPARISON"
}

Request body fields

The following table lists the available input parameters.

Field	Data type	Description
`querySetId`	String	The query set ID.
`searchConfigurationList`	List	A list of search configuration IDs to use for comparison.
`size`	Integer	The number of documents to return in the results.
`type`	String	Defines the type of experiment to run. Valid values are `PAIRWISE_COMPARISON`, `HYBRID_OPTIMIZER`, or `POINTWISE_EVALUATION`. Depending on the experiment type, you must provide different body fields in the request. `PAIRWISE_COMPARISON` is for comparing two search configurations against a query set and is used here. `HYBRID_OPTIMIZER` is for combining results and is used here. `POINTWISE_EVALUATION` is for evaluating a search configuration against judgments and is used here.

The response contains the experiment ID of the created experiment:

{
    "experiment_id": "cbd2c209-96d1-4012-aa73-e524b7a1b11a",
    "experiment_result": "CREATED"
}

Interpreting the experiment results

To interpret the experiment results, use the following operations.

Retrieving the experiment results

Use the following API to retrieve the result of a specific experiment.

Endpoints

GET _plugins/_search_relevance/experiments
GET _plugins/_search_relevance/experiments/<experiment_id>

Path parameters

The following table lists the available path parameters.

Parameter	Data type	Description
`experiment_id`	String	The ID of the experiment to retrieve. Retrieves all experiments when empty.

Example request

GET _plugins/_search_relevance/experiments/cbd2c209-96d1-4012-aa73-e524b7a1b11a

Example response

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": ".plugins-search-relevance-experiment",
        "_id": "cbd2c209-96d1-4012-aa73-e524b7a1b11a",
        "_score": 1,
        "_source": {
          "id": "cbd2c209-96d1-4012-aa73-e524b7a1b11a",
          "timestamp": "2025-06-11T23:24:26.792Z",
          "type": "PAIRWISE_COMPARISON",
          "status": "PROCESSING",
          "querySetId": "8368a359-146b-4690-b756-40591b2fcddb",
          "searchConfigurationList": [
            "a5acc9f3-6ad7-43f4-9651-fe118c499bc6",
            "26c7255c-c36e-42fb-b5b2-633dbf8e53b6"
          ],
          "judgmentList": [],
          "size": 10,
          "results": {}
        }
      }
    ]
  }
}

Once the experiment finishes running, the results are available:

Response

{
    "took": 34,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": ".plugins-search-relevance-experiment",
                "_id": "cbd2c209-96d1-4012-aa73-e524b7a1b11a",
                "_score": 1.0,
                "_source": {
                    "id": "cbd2c209-96d1-4012-aa73-e524b7a1b11a",
                    "timestamp": "2025-06-12T04:18:37.284Z",
                    "type": "PAIRWISE_COMPARISON",
                    "status": "COMPLETED",
                    "querySetId": "7889ffe9-835e-4f48-a9cd-53905bb967d3",
                    "searchConfigurationList": [
                        "a5acc9f3-6ad7-43f4-9651-fe118c499bc6",
                        "26c7255c-c36e-42fb-b5b2-633dbf8e53b6"
                    ],
                    "judgmentList": [],
                    "size": 10,
                    "results": {
                        "tv": {
                            "26c7255c-c36e-42fb-b5b2-633dbf8e53b6": [
                                "B07X3S9RTZ",
                                "B07WVZFKLQ",
                                "B00GXD4NWE",
                                "B07ZKCV5K5",
                                "B07ZKDVHFB",
                                "B086VKT9R8",
                                "B08XLM8YK1",
                                "B07FPP6TB5",
                                "B07N1TMNHB",
                                "B09CDHM8W7"
                            ],
                            "pairwiseComparison": {
                                "jaccard": 0.11,
                                "rbo90": 0.16,
                                "frequencyWeighted": 0.2,
                                "rbo50": 0.07
                            },
                            "a5acc9f3-6ad7-43f4-9651-fe118c499bc6": [
                                "B07Q7VGW4Q",
                                "B00GXD4NWE",
                                "B07VML1CY1",
                                "B07THVCJK3",
                                "B07RKSV7SW",
                                "B010EAW8UK",
                                "B07FPP6TB5",
                                "B073G9ZD33",
                                "B07VXRXRJX",
                                "B07Q45SP9P"
                            ]
                        },
                        "led tv": {
                            "26c7255c-c36e-42fb-b5b2-633dbf8e53b6": [
                                "B01M1D0KL1",
                                "B07YSMD3Z9",
                                "B07V4CY9GZ",
                                "B074KFP426",
                                "B07S8XNWWF",
                                "B07XBJR7GY",
                                "B075FDWSHT",
                                "B01N2Z17MS",
                                "B07F1T4JFB",
                                "B07S658ZLH"
                            ],
                            "pairwiseComparison": {
                                "jaccard": 0.11,
                                "rbo90": 0.13,
                                "frequencyWeighted": 0.2,
                                "rbo50": 0.03
                            },
                            "a5acc9f3-6ad7-43f4-9651-fe118c499bc6": [
                                "B07Q45SP9P",
                                "B074KFP426",
                                "B07JKVKZX8",
                                "B07THVCJK3",
                                "B0874XJYW8",
                                "B08LVPWQQP",
                                "B07V4CY9GZ",
                                "B07X3BS3DF",
                                "B074PDYLCZ",
                                "B08CD9MKLZ"
                            ]
                        }
                    }
                }
            }
        ]
    }
}

Interpreting the results

As shown in the preceding response, both search configurations return the top N documents, with size set to 10 in the search request. In addition to the results, the response also includes metrics from the pairwise comparison.

Response body fields

Field	Description
`jaccard`	Shows the similarity score by dividing the intersection cardinality by the union cardinality of the returned documents.
`rbo`	The Rank-Biased Overlap (RBO) metric compares the returned result sets at each ranking depth—for example, the top 1 document, top 2 documents, and so on. It places greater importance on higher-ranked results, giving more weight to earlier positions in the list.
`frequencyWeighted`	Similar to the Jaccard metric, the frequency-weighted metric calculates the ratio of the weighted intersection to the weighted union of two sets. However, unlike standard Jaccard, it gives more weight to documents with higher frequencies, skewing the result toward more frequently occurring items.

Creating a pairwise experiment
- Example
- Request body fields
Interpreting the experiment results

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.