Synthetic data generation
Introduced 2.0
OpenSearch Benchmark provides a built-in synthetic data generator that can create datasets for any use case at any scale. It currently supports two generation methods:
- Random data generation produces fields with randomized values. This is useful for stress testing and evaluating system performance under load.
- Rule-based data generation creates data according to user-defined rules. This is helpful for testing specific scenarios, benchmarking query behavior, or simulating domain-specific patterns.
Data generation methods
OpenSearch Benchmark currently supports the following data generation methods.
For advanced synthetic data generation capabilities, explore vector generation.