corpora
The corpora element contains all the document corpora used by the workload. You can use document corpora across workloads by copying and pasting any corpora definitions.
Example
The following example defines a single corpus called movies with 11658903 documents and 1544799789 uncompressed bytes:
  "corpora": [
    {
      "name": "movies",
      "documents": [
        {
          "source-file": "movies-documents.json",
          "document-count": 11658903, # Fetch document count from command line
          "uncompressed-bytes": 1544799789 # Fetch uncompressed bytes from command line
        }
      ]
    }
  ]
Configuration options
Use the following options with corpora.
| Parameter | Required | Type | Description | 
|---|---|---|---|
| name | Yes | String | The name of the document corpus. Because OpenSearch Benchmark uses this name in its directories, use only lowercase names without white spaces. | 
| documents | Yes | JSON array | An array of document files. | 
| meta | No | String | A mapping of key-value pairs with additional metadata for a corpus. | 
Each entry in the documents array consists of the following options.
| Parameter | Required | Type | Description | 
|---|---|---|---|
| source-file | Yes | String | The file name containing the corresponding documents for the workload. When using OpenSearch Benchmark locally, documents are contained in a JSON file. When providing a base_url, use a compressed file format:.zip,.bz2,.gz,.tar,.tar.gz,.tgz, or.tar.bz2. The compressed file must have one JSON file containing the name. | 
| document-count | Yes | Integer | The number of documents in the source-file, which determines which client indexes correlate to which parts of the document corpus. Each N client receives an Nth of the document corpus. When using a source that contains a document with a parent/child relationship, specify the number of parent documents. | 
| base-url | No | String | An http(s), Amazon Simple Storage Service (Amazon S3), or Google Cloud Storage URL that points to the root path where OpenSearch Benchmark can obtain the corresponding source file. | 
| source-format | No | String | Defines the format OpenSearch Benchmark uses to interpret the data file specified in source-file. Onlybulkis supported. | 
| compressed-bytes | No | Integer | The size, in bytes, of the compressed source file, indicating how much data OpenSearch Benchmark downloads. | 
| uncompressed-bytes | No | Integer | The size, in bytes, of the source file after decompression, indicating how much disk space the decompressed source file needs. | 
| target-index | No | String | Defines the name of the index that the bulkoperation should target. OpenSearch Benchmark automatically derives this value when only one index is defined in theindiceselement. The value oftarget-indexis ignored when theincludes-action-and-meta-datasetting istrue. | 
| target-type | No | String | Defines the document type of the target index targeted in bulk operations. OpenSearch Benchmark automatically derives this value when only one index is defined in the indiceselement and the index has only one type. The value oftarget-typeis ignored when theincludes-action-and-meta-datasetting istrue. | 
| includes-action-and-meta-data | No | Boolean | When set to true, indicates that the document’s file already contains anactionline and ameta-dataline. Whenfalse, indicates that the document’s file contains only documents. Default isfalse. | 
| meta | No | String | A mapping of key-value pairs with additional metadata for a corpus. |