Link Search Menu Expand Document Documentation Menu

Judgments

A judgment is a relevance rating assigned to a specific document in the context of a particular query. Multiple judgments are grouped together into judgment lists. Typically, judgments are categorized as two types—implicit and explicit:

  • Implicit judgments are ratings derived from user behavior (for example, what did the user see and select after searching?).
  • Humans have traditionally produced explicit judgments, but large language models (LLMs) are increasingly used for this task.

Search Relevance Workbench (SRW) supports all types of judgments:

  • Using LLMs as automated judges (an approach known as LLM-as-a-Judge) to generate judgments by evaluating search results using a prompt.
  • Generating implicit judgments based on data that adheres to the User Behavior Insights (UBI) schema specification.
  • Importing judgments that were collected using a process outside of SRW.

Using LLM-as-a-Judge

Generate explicit judgments with an LLM in SRW when you don’t have human annotators available, or you need to scale up the number of judgments beyond what humans can provide.

For step-by-step instructions, see Using LLM-as-a-Judge for search relevance.

Prerequisites

To use LLM-as-a-Judge, configure the following components:

  • A connector to an LLM to use for generating the judgments. For more information, see Connectors.
  • A query set: Together with the size parameter, the query set defines the scope for generating judgments. For each query, the top k documents are retrieved from the specified index, in which k is defined by the size parameter.
  • A search configuration: A search configuration defines how documents are retrieved for use in query-document pairs.

The AI-assisted judgment process consists of the following steps:

  • For each query, the top k documents are retrieved using the defined search configuration, which includes the index information. The query and each document from the result list create a query-document pair.
  • The LLM is then called with a predefined prompt to generate a judgment for each query-document pair.
  • All generated judgments are stored in the judgments cache index for reuse in future experiments.

To create a judgment list, provide the model ID of the LLM, an available query set, and a created search configuration.

The following example uses a generic prompt template with a scale of 0.0 to 1.0. To reduce the volume of data sent to the LLM (and therefore the cost), use the contextFields parameter to specify which fields from each result to include:

PUT _plugins/_search_relevance/judgments
{
    "name":"AI-assisted judgment list",
    "description": "Uses GPT-3.5-turbo to evaluate product search results",
    "type":"LLM_JUDGMENT",
    "modelId":"N8AE1osB0jLkkocYjz7D",
    "querySetId":"5f0115ad-94b9-403a-912f-3e762870ccf6",
    "searchConfigurationList":["2f90d4fd-bd5e-450f-95bb-eabe4a740bd1"],
    "size":5,
    "contextFields": ["title", "description", "category"],
    "llmJudgmentRatingType": "SCORE0_1",
    "promptTemplate": "Rate the relevance of these search results {{hits}} for the query '{{queryText}}' on a scale of 0-1, where 0 is completely irrelevant and 1 is perfectly relevant. Consider the product title, description, and category."
}

Request body fields

The following table lists the parameters for creating LLM-based judgments.

Parameter Data type Description
name String The name of the judgment list.
description String Optional. A description of the judgment list.
type String Set to LLM_JUDGMENT.
modelId String The ID of the deployed machine learning (ML) model to use for generating judgments. Must be a remote model connected to an external LLM service.
querySetId String The ID of the query set containing the queries to evaluate.
searchConfigurationList Array of strings The list of search configuration IDs to use for retrieving documents to evaluate.
size Integer The number of top documents to retrieve and evaluate for each query. Default is 10.
tokenLimit Integer The maximum number of tokens to send to the LLM in a single request. Used to batch documents when the total content exceeds this limit. Default is 4,000.
contextFields Array of strings Optional. Specifies which document fields to include when sending content to the LLM. If not specified, the entire document source is sent. Use this parameter to reduce costs and focus the LLM on relevant fields.
ignoreFailure Boolean Whether to continue processing other documents if the LLM fails to generate a judgment for some documents. Default is false.
llmJudgmentRatingType String The type of rating scale to use. Valid values are SCORE0_1 (numeric scale 0–1) and RELEVANT_IRRELEVANT (binary relevant/irrelevant). Use SCORE0_1 for graded relevance metrics such as NDCG. Use RELEVANT_IRRELEVANT for binary metrics such as precision and recall.
promptTemplate String Optional. A custom prompt template for the LLM. Supports {{queryText}} and {{hits}} placeholders. If not provided, the default template is used.
overwriteCache Boolean Whether to overwrite existing cached judgments for the same query-document pairs. Default is false (reuse cached judgments).

Custom prompt templates

You can customize the prompt template to focus on specific aspects of relevance:

PUT /_plugins/_search_relevance/judgments
{
  "name": "Custom Prompt Judgment",
  "type": "LLM_JUDGMENT",
  "modelId": "MODEL_ID_HERE",
  "querySetId": "QUERY_SET_ID_HERE",
  "searchConfigurationList": ["SEARCH_CONFIGURATION_ID_HERE"],
  "promptTemplate": "As an e-commerce search expert, evaluate how well these products {{hits}} match the user's search for '{{queryText}}'. Consider product relevance, brand reputation, and price competitiveness. Rate each result from 0-1.",
  "llmJudgmentRatingType": "SCORE0_1"
}

Binary relevance judgments

For simpler relevance assessment, you can use binary (relevant/irrelevant) judgments:

PUT /_plugins/_search_relevance/judgments
{
  "name": "Binary LLM Judgment",
  "type": "LLM_JUDGMENT",
  "modelId": "MODEL_ID_HERE",
  "querySetId": "QUERY_SET_ID_HERE",
  "searchConfigurationList": ["SEARCH_CONFIGURATION_ID_HERE"],
  "llmJudgmentRatingType": "RELEVANT_IRRELEVANT",
  "promptTemplate": "Determine if these search results {{hits}} are relevant or irrelevant for the query '{{queryText}}'. Consider exact matches and semantic relevance."
}

Using different LLM providers

You can adapt the connector configuration for other providers.

Amazon Bedrock example

The following example creates a connector for Amazon Bedrock:

POST /_plugins/_ml/connectors/_create
{
  "name": "Amazon Bedrock Connector",
  "description": "Connector to Amazon Bedrock",
  "version": "1",
  "protocol": "aws_sigv4",
  "parameters": {
    "region": "us-east-1",
    "service_name": "bedrock",
    "model": "anthropic.claude-v2"
  },
  "credential": {
    "access_key": "YOUR_ACCESS_KEY",
    "secret_key": "YOUR_SECRET_KEY"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/${parameters.model}/invoke",
      "request_body": "{ \"prompt\": \"${parameters.messages}\", \"max_tokens_to_sample\": 300 }"
    }
  ]
}

Implicit judgments

Implicit judgments are derived from past user interactions. SRW supports the Clicks Over Expected Clicks (COEC) click model, which uses impression and click signals to calculate judgments.

Input data must follow the UBI index schemas. COEC uses every event in the ubi_events index with an action_name of impression or click.

COEC calculates an expected click-through rate (CTR) for each rank by dividing the total number of clicks by the total number of impressions observed at that rank, based on all events in ubi_events. This ratio represents the expected CTR for that position.

For each document displayed in a hit list after a query, the average CTR at that rank serves as the expected value for the query-document pair. COEC calculates the actual CTR for the query-document pair and divides it by this expected rank-based CTR. Consequently, query-document pairs with a higher CTR than the average for that rank have a judgment value greater than 1. Conversely, if the CTR is lower than average, the judgment value is lower than 1.

Depending on the tracking implementation, multiple clicks for a single query can be recorded in the ubi_events index. Consequently, the average CTR can sometimes exceed 1 (or 100%).

For query-document observations that occur at different positions, all impressions and clicks are assumed to have occurred at the lowest (best) position. This aggregation approach biases the final judgment toward lower values, reflecting the common trend that higher-ranked results typically receive higher CTRs.

Example request

The following example creates an implicit judgment list using the COEC click model:

PUT _plugins/_search_relevance/judgments
{
  "name": "Implicit Judgments",
  "clickModel": "coec",
  "type": "UBI_JUDGMENT",
  "maxRank": 20
}

Request body fields

The following table lists the parameters for creating implicit judgments.

Parameter Data type Description
name String The name of the judgment list.
clickModel String The model used to calculate implicit judgments. Only coec (Clicks Over Expected Clicks) is supported.
type String Set to UBI_JUDGMENT.
maxRank Integer The maximum rank to consider when including events in the judgment calculation.
startDate Date An optional starting date from which behavioral data events are considered for implicit judgment generation. The format is yyyy-MM-dd.
endDate Date An optional end date until which behavioral data events are considered for implicit judgment generation. The format is yyyy-MM-dd.

Importing judgments

You may already have external processes for generating judgments. Regardless of the judgment type or the way they were generated, you can import them into SRW.

Example request

The following example imports a set of judgments for two queries:

PUT _plugins/_search_relevance/judgments
{
  "name": "Imported Judgments",
  "description": "Judgments generated outside SRW",
  "type": "IMPORT_JUDGMENT",
  "judgmentRatings": [
    {
      "query": "red dress",
        "ratings": [
          {
                    "docId": "B077ZJXCTS",
                    "rating": "3.000"
          },
          {
                    "docId": "B071S6LTJJ",
                    "rating": "2.000"
          },
          {
                    "docId": "B01IDSPDJI",
                    "rating": "2.000"
          },
          {
                    "docId": "B07QRCGL3G",
                    "rating": "0.000"
          },
          {
                    "docId": "B074V6Q1DR",
                    "rating": "1.000"
          }
        ]
      },
      {
        "query": "blue jeans",
        "ratings": [
          {
                    "docId": "B07L9V4Y98",
                    "rating": "0.000"
          },
          {
                    "docId": "B01N0DSRJC",
                    "rating": "1.000"
          },
          {
                    "docId": "B001CRAWCQ",
                    "rating": "1.000"
          },
          {
                    "docId": "B075DGJZRM",
                    "rating": "2.000"
          },
          {
                    "docId": "B009ZD297U",
                    "rating": "2.000"
          }
        ]
      }
  ]
}

Request body fields

The following table lists the parameters for importing judgments.

Parameter Data type Description
name String The name of the judgment list.
description String An optional description of the judgment list.
type String Set to IMPORT_JUDGMENT.
judgmentRatings Array A list of JSON objects containing the judgments. Judgments are grouped by query, each containing a nested map in which document IDs (docId) serve as keys and their floating-point ratings serve as values.

Managing judgment lists

You can retrieve or delete judgment lists using the following APIs.

Viewing a judgment list

Retrieve a judgment list by its ID.

Endpoint

GET _plugins/_search_relevance/judgments/{judgment_list_id}

Path parameters

The following table lists the available path parameters.

Parameter Data type Description
judgment_list_id String The ID of the judgment list to retrieve.

Example request

GET _plugins/_search_relevance/judgments/b54f791a-3b02-49cb-a06c-46ab650b2ade

Example response

Response
{
  "took": 36,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "search-relevance-judgment",
        "_id": "b54f791a-3b02-49cb-a06c-46ab650b2ade",
        "_score": 1,
        "_source": {
          "id": "b54f791a-3b02-49cb-a06c-46ab650b2ade",
          "timestamp": "2025-06-11T06:07:23.766Z",
          "name": "Imported Judgments",
          "status": "COMPLETED",
          "type": "IMPORT_JUDGMENT",
          "metadata": {},
          "judgmentRatings": [
            {
              "query": "red dress",
              "ratings": [
                {
                  "rating": "3.000",
                  "docId": "B077ZJXCTS"
                },
                {
                  "rating": "2.000",
                  "docId": "B071S6LTJJ"
                },
                {
                  "rating": "2.000",
                  "docId": "B01IDSPDJI"
                },
                {
                  "rating": "0.000",
                  "docId": "B07QRCGL3G"
                },
                {
                  "rating": "1.000",
                  "docId": "B074V6Q1DR"
                }
              ]
            },
            {
              "query": "blue jeans",
              "ratings": [
                {
                  "rating": "0.000",
                  "docId": "B07L9V4Y98"
                },
                {
                  "rating": "1.000",
                  "docId": "B01N0DSRJC"
                },
                {
                  "rating": "1.000",
                  "docId": "B001CRAWCQ"
                },
                {
                  "rating": "2.000",
                  "docId": "B075DGJZRM"
                },
                {
                  "rating": "2.000",
                  "docId": "B009ZD297U"
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

Deleting a judgment list

Delete a judgment list by its ID.

Endpoint

DELETE _plugins/_search_relevance/judgments/{judgment_list_id}

Example request

DELETE _plugins/_search_relevance/judgments/b54f791a-3b02-49cb-a06c-46ab650b2ade

Example response

{
  "_index": "search-relevance-judgment",
  "_id": "b54f791a-3b02-49cb-a06c-46ab650b2ade",
  "_version": 3,
  "result": "deleted",
  "forced_refresh": true,
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 156,
  "_primary_term": 1
}

Searching for a judgment list

Search for judgment lists using query domain-specific language (DSL). The response excludes judgmentRatings.ratings by default; to include it, specify the _source field in the query.

Endpoints

GET _plugins/_search_relevance/judgments/_search
POST _plugins/_search_relevance/judgments/_search

Example request

The following example searches for judgment lists that include the exact query red dress:

GET _plugins/_search_relevance/judgments/_search
{
  "query": {
    "nested": {
      "path": "judgmentRatings",
      "query": {
        "match_phrase": {
          "judgmentRatings.query": "red dress"
        }
      }
    }
  }
}

Example response

{
  "took": 29,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 4.5558767,
    "hits": [
      {
        "_index": "search-relevance-judgment",
        "_id": "505d00cf-2fce-422b-bb97-2e3a95ce9446",
        "_score": 4.5558767,
        "_source": {
          "metadata": {},
          "name": "Imported Judgments",
          "judgmentRatings": [
            {
              "query": "red dress"
            },
            {
              "query": "blue jeans"
            }
          ],
          "id": "505d00cf-2fce-422b-bb97-2e3a95ce9446",
          "type": "IMPORT_JUDGMENT",
          "timestamp": "2026-01-28T18:16:44.218Z",
          "status": "COMPLETED"
        }
      }
    ]
  }
}