Tips and best practices

The following tips help you efficiently generate synthetic data and monitor performance during the process.

Visualizing generation

The generated URL opens a Dask dashboard that visualizes the data generation process. You can monitor CPU and memory usage for each worker and view a CPU flamegraph of the generation workflow. This helps track resource usage and optimize performance, especially when using a custom Python module.

Use default settings

We recommend starting with the default synthetic data generation settings. These guidelines help you choose appropriate settings for efficient and reliable synthetic data generation:

Set the number of workers to no more than the CPU count on the load generation host.
Use a chunk size of 10,000 documents per chunk.
Adjust the max_file_size_gb setting as needed to control how much data is written to each generated file.

Visualizing generation
Use default settings

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

Tips and best practices

Visualizing generation

Use default settings

OpenSearch Links

Get Involved

Resources

Contact Us