Geo: Treat missing files as sync failures
What does this MR do and why?
Describe in detail what your merge request does and why.
From #348295 (comment 775321315):
The missing-on-primary failures are revealed via a confusing loop: Sync "succeeds", verification fails, so sync state changes to failed, repeat.
I think the improvement should be: When a sync attempt discovers the file is missing, it should mark it as "failed". This may be as easy as removing the second condition from https://gitlab.com/gitlab-org/gitlab/-/blob/073b67bc920a7ae6ff3a3b0eab7f7c7c92b02e26/ee/app/services/geo/blob_download_service.rb#L34
This is a small but significant design change to files/blobs replicated by the Geo Self-Service Framework.
Before
When a file is missing on the primary, and a secondary attempts to sync it, the secondary considers it "synced", since its state matches the primary.
After
With this change, the secondary considers the sync "failed", since it was unable to sync the file and this is an undesirable state.
Other implications
For blob types which have Geo verification enabled, this change short-circuits a logical loop in which sync succeeds but verification fails and then sync becomes failed and then sync gets retried.
This loop affects blobs replicated and verified by the Geo Self-Service Framework, including:
- Package Files
- Terraform State Versions
- Pipeline Artifacts
And soon to include:
- LFS Objects
- Pages Deployments
- Uploads
- CI Job Artifacts
This change is feature flagged behind
geo_treat_missing_files_as_sync_failed
so we can test in staging. Also
I intend to enable it by default before removing it. Therefore customers
will be able to easily switch back to old behavior for a whole
milestone, in case of any unforeseen problems with this design change.
Part of #348745 (closed)
Screenshots or screen recordings
These are strongly recommended to assist reviewers and reduce the time to merge your change.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.