Link Search Menu Expand Document Documentation Menu

Collapsing hybrid query results

Introduced 3.1

The collapse parameter lets you group results by a field, returning only the highest scoring document for each unique field value. This is useful when you want to avoid duplicates in your search results. The field you collapse on must be of type keyword or a numeric type. The number of results returned is still limited by the size parameter in your query.

The collapse parameter is compatible with other hybrid query search options, such as sort, explain, and pagination, using their standard syntax.

When using collapse in a hybrid query, note the following considerations:

  • Inner hits are not supported.
  • Performance may be impacted when working with large result sets.
  • Aggregations run on pre-collapsed results, not the final output.
  • Pagination behavior changes: Because collapse reduces the total number of results, it can affect how results are distributed across pages. To retrieve more results, consider increasing the pagination depth.
  • Results may differ from those returned by the collapse response processor, which applies collapse logic after the query is executed.

Example

The following example demonstrates how to collapse hybrid query results.

Create an index:

PUT /bakery-items
{
  "mappings": {
    "properties": {
      "item": {
        "type": "keyword"
      },
      "category": {
        "type": "keyword"
      },
      "price": {
        "type": "float"
      },
      "baked_date": {
        "type": "date"
      }
    }
  }
}

Ingest documents into the index:

POST /bakery-items/_bulk
{ "index": {} }
{ "item": "Chocolate Cake", "category": "cakes", "price": 15, "baked_date": "2023-07-01T00:00:00Z" }
{ "index": {} }
{ "item": "Chocolate Cake", "category": "cakes", "price": 18, "baked_date": "2023-07-04T00:00:00Z" }
{ "index": {} }
{ "item": "Vanilla Cake", "category": "cakes", "price": 12, "baked_date": "2023-07-02T00:00:00Z" }
{ "index": {} }
{ "item": "Vanilla Cake", "category": "cakes", "price": 16, "baked_date": "2023-07-03T00:00:00Z" }
{ "index": {} }
{ "item": "Vanilla Cake", "category": "cakes", "price": 17, "baked_date": "2023-07-09T00:00:00Z" }

Create a search pipeline. This example uses the min_max normalization technique:

PUT /_search/pipeline/norm-pipeline
{
  "description": "Normalization processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        },
        "combination": {
          "technique": "arithmetic_mean"
        }
      }
    }
  ]
}

Search the index, grouping the search results by the item field:

GET /bakery-items/_search?search_pipeline=norm-pipeline
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "item": "Chocolate Cake"
          }
        },
        {
          "bool": {
            "must": {
              "match": {
                "category": "cakes"
              }
            }
          }
        }
      ]
    }
  },
  "collapse": {
    "field": "item"
  }
}

The response returns the collapsed search results:

"hits": {
    "total": {
      "value": 5,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "bakery-items",
        "_id": "wBRPZZcB49c_2-1rYmO7",
        "_score": 1.0,
        "_source": {
          "item": "Chocolate Cake",
          "category": "cakes",
          "price": 15,
          "baked_date": "2023-07-01T00:00:00Z"
        },
        "fields": {
          "item": [
            "Chocolate Cake"
          ]
        }
      },
      {
        "_index": "bakery-items",
        "_id": "whRPZZcB49c_2-1rYmO7",
        "_score": 0.5005,
        "_source": {
          "item": "Vanilla Cake",
          "category": "cakes",
          "price": 12,
          "baked_date": "2023-07-02T00:00:00Z"
        },
        "fields": {
          "item": [
            "Vanilla Cake"
          ]
        }
      }
    ]
  }

Collapse and sort results

To collapse and sort hybrid query results, provide the collapse and sort parameters in the query:

GET /bakery-items/_search?search_pipeline=norm-pipeline
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
                "item": "Chocolate Cake"
          }
        },
        {
          "bool": {
                "must": {
                    "match": {
                        "category": "cakes"
                    }
                }
          }
        }
      ]
    }
  },
  "collapse": {
    "field": "item"
  },
  "sort": "price"
}

For more information about sorting in a hybrid query, see Using sorting with a hybrid query.

In the response, documents are sorted by the lowest price:

"hits": {
    "total": {
      "value": 5,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [
      {
        "_index": "bakery-items",
        "_id": "whRPZZcB49c_2-1rYmO7",
        "_score": null,
        "_source": {
          "item": "Vanilla Cake",
          "category": "cakes",
          "price": 12,
          "baked_date": "2023-07-02T00:00:00Z"
        },
        "fields": {
          "item": [
            "Vanilla Cake"
          ]
        },
        "sort": [
          12.0
        ]
      },
      {
        "_index": "bakery-items",
        "_id": "wBRPZZcB49c_2-1rYmO7",
        "_score": null,
        "_source": {
          "item": "Chocolate Cake",
          "category": "cakes",
          "price": 15,
          "baked_date": "2023-07-01T00:00:00Z"
        },
        "fields": {
          "item": [
            "Chocolate Cake"
          ]
        },
        "sort": [
          15.0
        ]
      }
    ]
  }

Collapse and explain

You can provide the explain query parameter when collapsing search results:

GET /bakery-items/_search?search_pipeline=norm-pipeline&explain=true
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
                "item": "Chocolate Cake"
          }
        },
        {
          "bool": {
                "must": {
                    "match": {
                        "category": "cakes"
                    }
                }
          }
        }
      ]
    }
  },
  "collapse": {
    "field": "item"
  }
}

The response contains detailed information about the scoring process for each search result:

"hits": {
        "total": {
            "value": 5,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_shard": "[bakery-items][0]",
                "_node": "Jlu8P9EaQCy3C1BxaFMa_g",
                "_index": "bakery-items",
                "_id": "3ZILepcBheX09_dPt8TD",
                "_score": 1.0,
                "_source": {
                    "item": "Chocolate Cake",
                    "category": "cakes",
                    "price": 15,
                    "baked_date": "2023-07-01T00:00:00Z"
                },
                "fields": {
                    "item": [
                        "Chocolate Cake"
                    ]
                },
                "_explanation": {
                    "value": 1.0,
                    "description": "combined score of:",
                    "details": [
                        {
                            "value": 1.0,
                            "description": "ConstantScore(item:Chocolate Cake)",
                            "details": []
                        },
                        {
                            "value": 1.0,
                            "description": "ConstantScore(category:cakes)",
                            "details": []
                        }
                    ]
                }
            },
            {
                "_shard": "[bakery-items][0]",
                "_node": "Jlu8P9EaQCy3C1BxaFMa_g",
                "_index": "bakery-items",
                "_id": "35ILepcBheX09_dPt8TD",
                "_score": 0.5005,
                "_source": {
                    "item": "Vanilla Cake",
                    "category": "cakes",
                    "price": 12,
                    "baked_date": "2023-07-02T00:00:00Z"
                },
                "fields": {
                    "item": [
                        "Vanilla Cake"
                    ]
                },
                "_explanation": {
                    "value": 1.0,
                    "description": "combined score of:",
                    "details": [
                        {
                            "value": 0.0,
                            "description": "ConstantScore(item:Chocolate Cake) doesn't match id 2",
                            "details": []
                        },
                        {
                            "value": 1.0,
                            "description": "ConstantScore(category:cakes)",
                            "details": []
                        }
                    ]
                }
            }
        ]
    }

For more information about using explain in a hybrid query, see Hybrid search explain.

Collapse and pagination

You can paginate collapsed results by providing the from and size parameters. For more information about pagination in a hybrid query, see Paginating hybrid query results. For more information about from and size, see The from and size parameters.

For this example, create the following index:

PUT /bakery-items-pagination
{
    "settings": {
         "index.number_of_shards": 3
    },
  "mappings": {
    "properties": {
      "item": {
        "type": "keyword"
      },
      "category": {
        "type": "keyword"
      },
      "price": {
        "type": "float"
      },
      "baked_date": {
        "type": "date"
      }
    }
  }
}

Ingest the following documents into the index:

POST /bakery-items-pagination/_bulk
{ "index": {} }
{ "item": "Chocolate Cake", "category": "cakes", "price": 15, "baked_date": "2023-07-01T00:00:00Z" }
{ "index": {} }
{ "item": "Chocolate Cake", "category": "cakes", "price": 18, "baked_date": "2023-07-02T00:00:00Z" }
{ "index": {} }
{ "item": "Vanilla Cake", "category": "cakes", "price": 12, "baked_date": "2023-07-02T00:00:00Z" }
{ "index": {} }
{ "item": "Vanilla Cake", "category": "cakes", "price": 11, "baked_date": "2023-07-04T00:00:00Z" }
{ "index": {} }
{ "item": "Ice Cream Cake", "category": "cakes", "price": 23, "baked_date": "2023-07-09T00:00:00Z" }
{ "index": {} }
{ "item": "Ice Cream Cake", "category": "cakes", "price": 22, "baked_date": "2023-07-10T00:00:00Z" }
{ "index": {} }
{ "item": "Carrot Cake", "category": "cakes", "price": 24, "baked_date": "2023-07-09T00:00:00Z" }
{ "index": {} }
{ "item": "Carrot Cake", "category": "cakes", "price": 26, "baked_date": "2023-07-21T00:00:00Z" }
{ "index": {} }
{ "item": "Red Velvet Cake", "category": "cakes", "price": 25, "baked_date": "2023-07-09T00:00:00Z" }
{ "index": {} }
{ "item": "Red Velvet Cake", "category": "cakes", "price": 29, "baked_date": "2023-07-30T00:00:00Z" }
{ "index": {} }
{ "item": "Cheesecake", "category": "cakes", "price": 27. "baked_date": "2023-07-09T00:00:00Z" }
{ "index": {} }
{ "item": "Cheesecake", "category": "cakes", "price": 34. "baked_date": "2023-07-21T00:00:00Z" }
{ "index": {} }
{ "item": "Coffee Cake", "category": "cakes", "price": 42, "baked_date": "2023-07-09T00:00:00Z" }
{ "index": {} }
{ "item": "Coffee Cake", "category": "cakes", "price": 41, "baked_date": "2023-07-05T00:00:00Z" }
{ "index": {} }
{ "item": "Cocunut Cake", "category": "cakes", "price": 23, "baked_date": "2023-07-09T00:00:00Z" }
{ "index": {} }
{ "item": "Cocunut Cake", "category": "cakes", "price": 32, "baked_date": "2023-07-12T00:00:00Z" }
// Additional documents omitted for brevity

Run a hybrid query, specifying the from and size parameters to paginate results. In the following example, the query requests two results starting from the sixth position (from: 5, size: 2). The pagination depth is set to limit each shard to return a maximum of 10 documents. After the results are retrieved, the collapse parameter is applied in order to group them by the item field:

GET /bakery-items-pagination/_search?search_pipeline=norm-pipeline
{
  "query": {
    "hybrid": {
      "pagination_depth": 10,
      "queries": [
        {
          "match": {
                "item": "Chocolate Cake"
          }
        },
        {
          "bool": {
                "must": {
                    "match": {
                        "category": "cakes"
                    }
                }
          }
        }
      ]
    }
  },
  "from": 5,
  "size": 2,
  "collapse": {
    "field": "item"
  }
}

"hits": {
        "total": {
            "value": 70,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "bakery-items-pagination",
                "_id": "gDayepcBIkxlgFKYda0p",
                "_score": 0.5005,
                "_source": {
                    "item": "Red Velvet Cake",
                    "category": "cakes",
                    "price": 29,
                    "baked_date": "2023-07-30T00:00:00Z"
                },
                "fields": {
                    "item": [
                        "Red Velvet Cake"
                    ]
                }
            },
            {
                "_index": "bakery-items-pagination",
                "_id": "aTayepcBIkxlgFKYca15",
                "_score": 0.5005,
                "_source": {
                    "item": "Vanilla Cake",
                    "category": "cakes",
                    "price": 12,
                    "baked_date": "2023-07-02T00:00:00Z"
                },
                "fields": {
                    "item": [
                        "Vanilla Cake"
                    ]
                }
            }
        ]
    }
350 characters left

Have a question? .

Want to contribute? or .