Replay captured traffic

The following information applies only to zero-downtime migrations that use Capture and Replay.

Replay is the step that closes the time gap between the snapshot used for backfill and the current state of the source system.

Replay start conditions

Replay should begin only after the dependent snapshot migration has completed. In the workflow model, this is normally expressed with:

dependsOnSnapshotMigrations

That dependency is important. Starting replay too early can cause ordering problems between historical backfill and live change processing.

Replay process

The Traffic Replayer performs the following steps:

Reads captured traffic from Kafka.
Reconstructs requests.
Applies replay-time transformations and authentication.
Sends the requests to the target cluster.
Advances Kafka offsets so replay can resume safely.

Key replay settings

The following table describes the key replay workflow settings.

Setting	Description	Default
`podReplicas`	The number of Replayer pods running in parallel.	N/A
`speedupFactor`	Controls how quickly replay catches up relative to the original traffic timeline. A value of `2.0` replays at twice the original rate.	`1.1`
`removeAuthHeader`	Strips the captured `Authorization` header before replaying. Use when the captured traffic carries credentials that are not valid against the target.	N/A
`authHeaderOverride`	Replaces the captured `Authorization` header with a static value.	N/A
`dependsOnSnapshotMigrations`	Specifies which snapshot migration must complete before replay begins.	N/A
`nonRetryableDocExceptionTypes`	A list of exception class names that are counted as failures but not retried because they are expected to fail deterministically. This setting differs from the RFS `allowedDocExceptionTypes`, which treats matching exceptions as success.	N/A

Do not set both replayerConfig.removeAuthHeader: true and an authConfig block on the same target. The schema rejects this combination. Either rely on the target’s authConfig (the Replayer applies it automatically) or strip the captured header.

Factors affecting replay duration

Replay duration depends on the following factors:

The volume of traffic captured during backfill.
The target cluster’s processing throughput.
The speedupFactor value in your configuration.

Monitor replay

Use the interactive workflow view whenever possible:

workflow manage

The following commands provide additional monitoring information:

workflow status
workflow log all --follow

Pre-cutover validation

Do not switch traffic only because the Replayer is running. Before proceeding, confirm the following:

Replay has reached the live edge.
Document counts are directionally correct.
Representative queries behave correctly on the target.
Target-side errors are understood and acceptable.

The following commands help validate the target state:

console clusters cat-indices
console clusters curl target /my-index/_search --json '{"query":{"match_all":{}},"size":5}'

After replay reaches the live edge and validation passes, proceed to switch traffic. Keep the source available until the rollback window has passed.

Previous phase

The previous phase depends on your migration scenario:

Scenario 2: Live capture only: Migrate metadata
Scenario 3: Live capture with backfill: Backfill

Next phase

The next phase in the migration process is:

Switch traffic to the target

For a complete overview of all migration phases, see Migration phases.

Replay start conditions
Replay process
Key replay settings
Factors affecting replay duration
Monitor replay
Pre-cutover validation
Previous phase
Next phase

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

Replay captured traffic

Replay start conditions

Replay process

Key replay settings

Factors affecting replay duration

Monitor replay

Pre-cutover validation

Previous phase

Next phase

OpenSearch Links

Get Involved

Resources

Contact Us