Link Search Menu Expand Document Documentation Menu

Comparing query sets

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated GitHub issue.

To compare the results of two different search configurations, you can run a pairwise experiment. To achieve this, you need two search configurations and a query set to use for the search configuration.

For more information about creating a query set, see Query Sets.

For more information about creating search configurations, see Search Configurations.

Creating a pairwise experiment

An experiment is used to compare the metrics between two different search configurations. An experiment shows you the top N results for every query based on the specified search configurations. In the dashboard, you can view the returned documents from any of the queries in the query set and determine which search configuration returns more relevant results. Additionally, you can measure the similarity between the two returned search result lists using the provided similarity metrics.

Example

To create a pairwise comparison experiment for the specified query set and search configurations, send the following request:

PUT _plugins/_search_relevance/experiments
{
    "querySetId": "8368a359-146b-4690-b756-40591b2fcddb",
   	"searchConfigurationList": ["a5acc9f3-6ad7-43f4-9651-fe118c499bc6", "26c7255c-c36e-42fb-b5b2-633dbf8e53b6"],
   	"size": 10,
   	"type": "PAIRWISE_COMPARISON"
}

Request body fields

The following table lists the available input parameters.

Field Data type Description
querySetId String The query set ID.
searchConfigurationList List A list of search configuration IDs to use for comparison.
size Integer The number of documents to return in the results.
type String Defines the type of experiment to run. Valid values are PAIRWISE_COMPARISON, HYBRID_OPTIMIZER, or POINTWISE_EVALUATION. Depending on the experiment type, you must provide different body fields in the request. PAIRWISE_COMPARISON is for comparing two search configurations against a query set and is used here. HYBRID_OPTIMIZER is for combining results and is used here. POINTWISE_EVALUATION is for evaluating a search configuration against judgments and is used here.

The response contains the experiment ID of the created experiment:

{
    "experiment_id": "cbd2c209-96d1-4012-aa73-e524b7a1b11a",
    "experiment_result": "CREATED"
}

Interpreting the experiment results

To interpret the experiment results, use the following operations.

Retrieving the experiment results

Use the following API to retrieve the result of a specific experiment.

Endpoints

GET _plugins/_search_relevance/experiments
GET _plugins/_search_relevance/experiments/<experiment_id>

Path parameters

The following table lists the available path parameters.

Parameter Data type Description
experiment_id String The ID of the experiment to retrieve. Retrieves all experiments when empty.

Example request

GET _plugins/_search_relevance/experiments/cbd2c209-96d1-4012-aa73-e524b7a1b11a

Example response

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": ".plugins-search-relevance-experiment",
        "_id": "cbd2c209-96d1-4012-aa73-e524b7a1b11a",
        "_score": 1,
        "_source": {
          "id": "cbd2c209-96d1-4012-aa73-e524b7a1b11a",
          "timestamp": "2025-06-11T23:24:26.792Z",
          "type": "PAIRWISE_COMPARISON",
          "status": "PROCESSING",
          "querySetId": "8368a359-146b-4690-b756-40591b2fcddb",
          "searchConfigurationList": [
            "a5acc9f3-6ad7-43f4-9651-fe118c499bc6",
            "26c7255c-c36e-42fb-b5b2-633dbf8e53b6"
          ],
          "judgmentList": [],
          "size": 10,
          "results": {}
        }
      }
    ]
  }
}

Once the experiment finishes running, the results are available:

Response
{
    "took": 34,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": ".plugins-search-relevance-experiment",
                "_id": "cbd2c209-96d1-4012-aa73-e524b7a1b11a",
                "_score": 1.0,
                "_source": {
                    "id": "cbd2c209-96d1-4012-aa73-e524b7a1b11a",
                    "timestamp": "2025-06-12T04:18:37.284Z",
                    "type": "PAIRWISE_COMPARISON",
                    "status": "COMPLETED",
                    "querySetId": "7889ffe9-835e-4f48-a9cd-53905bb967d3",
                    "searchConfigurationList": [
                        "a5acc9f3-6ad7-43f4-9651-fe118c499bc6",
                        "26c7255c-c36e-42fb-b5b2-633dbf8e53b6"
                    ],
                    "judgmentList": [],
                    "size": 10,
                    "results": {
                        "tv": {
                            "26c7255c-c36e-42fb-b5b2-633dbf8e53b6": [
                                "B07X3S9RTZ",
                                "B07WVZFKLQ",
                                "B00GXD4NWE",
                                "B07ZKCV5K5",
                                "B07ZKDVHFB",
                                "B086VKT9R8",
                                "B08XLM8YK1",
                                "B07FPP6TB5",
                                "B07N1TMNHB",
                                "B09CDHM8W7"
                            ],
                            "pairwiseComparison": {
                                "jaccard": 0.11,
                                "rbo90": 0.16,
                                "frequencyWeighted": 0.2,
                                "rbo50": 0.07
                            },
                            "a5acc9f3-6ad7-43f4-9651-fe118c499bc6": [
                                "B07Q7VGW4Q",
                                "B00GXD4NWE",
                                "B07VML1CY1",
                                "B07THVCJK3",
                                "B07RKSV7SW",
                                "B010EAW8UK",
                                "B07FPP6TB5",
                                "B073G9ZD33",
                                "B07VXRXRJX",
                                "B07Q45SP9P"
                            ]
                        },
                        "led tv": {
                            "26c7255c-c36e-42fb-b5b2-633dbf8e53b6": [
                                "B01M1D0KL1",
                                "B07YSMD3Z9",
                                "B07V4CY9GZ",
                                "B074KFP426",
                                "B07S8XNWWF",
                                "B07XBJR7GY",
                                "B075FDWSHT",
                                "B01N2Z17MS",
                                "B07F1T4JFB",
                                "B07S658ZLH"
                            ],
                            "pairwiseComparison": {
                                "jaccard": 0.11,
                                "rbo90": 0.13,
                                "frequencyWeighted": 0.2,
                                "rbo50": 0.03
                            },
                            "a5acc9f3-6ad7-43f4-9651-fe118c499bc6": [
                                "B07Q45SP9P",
                                "B074KFP426",
                                "B07JKVKZX8",
                                "B07THVCJK3",
                                "B0874XJYW8",
                                "B08LVPWQQP",
                                "B07V4CY9GZ",
                                "B07X3BS3DF",
                                "B074PDYLCZ",
                                "B08CD9MKLZ"
                            ]
                        }
                    }
                }
            }
        ]
    }
}

Interpreting the results

As shown in the preceding response, both search configurations return the top N documents, with size set to 10 in the search request. In addition to the results, the response also includes metrics from the pairwise comparison.

Response body fields

Field Description
jaccard Shows the similarity score by dividing the intersection cardinality by the union cardinality of the returned documents.
rbo The Rank-Biased Overlap (RBO) metric compares the returned result sets at each ranking depth—for example, the top 1 document, top 2 documents, and so on. It places greater importance on higher-ranked results, giving more weight to earlier positions in the list.
frequencyWeighted Similar to the Jaccard metric, the frequency-weighted metric calculates the ratio of the weighted intersection to the weighted union of two sets. However, unlike standard Jaccard, it gives more weight to documents with higher frequencies, skewing the result toward more frequently occurring items.