Skip to content

Praefect: handling of stale replication jobs

Pavlo Strokov requested to merge ps-process-stale-replication into master

New background job to move stale replication events to the next state: 'failed' or 'dead'. The event considered stale if the processing of the job takes too long without any health-pings. Health-ping is defined by the 'triggered_at' column of the 'replication_queue_job_lock' that exists only if replication event is 'in_progress' state. The check executed periodically and starts with start of the service. The check covers all storages at once. Check stops when the passed in context is cancelled/expired.

Closes: #2873 (closed) Follow up of the: !2321 (merged)

Edited by Pavlo Strokov

Merge request reports

Loading