Geo bug: Verification failure is too long
Summary
In a review app in Dedicated on GitLab 16.4.2 (internal Slack thread), the secondary site had some package files which were marked synced
but actually the files were missing in the secondary site's bucket. The verification job was getting 403 Forbidden when trying to verify the file size of a package file that didn't exist.
But the 403 Forbidden error was long, and the Geo verification code would try to transition the package file registry to verification_failed
.
Unfortunately, another error would be raised: Cannot transition verification_state via :verification_failed from :verification_started (Reason(s): Verification failure is too long (maximum is 255 characters))
This second error obscures the real error.
The registry record was left in the verification_started
state.
After 8 hours, a background job moves the registry record to verification_failed
with verification_failure message: Verification timed out after 28800
.
Then, Geo tries to verify the registry again. Repeat from step 1.
This issue is for the Verification failure is too long
part.
It's severity2 because the loop would be avoided, and everything would be eventually synced, if this one part were fixed.
Steps to reproduce
Example Project
What is the current bug behavior?
What is the expected correct behavior?
Relevant logs and/or screenshots
Possible fixes
I'm confused since we already truncate to 255. Is it off by 1 or something? Does truncate work with newlines in the string?