Maven package registry returning 409 when uploading the sha1
🔥 Problem
From https://gitlab.com/gitlab-com/ops-sub-department/section-ops-request-for-help/-/issues/6.
Maven packages are not uploaded in a single step. Instead, multiple files are uploaded in a sequence. We can see such sequence documented here.
Among the possible files uploaded, the sha1
digest can be uploaded. We can see here that:
- Those uploads are ignored = a package file is not created = nothing is stored on object storage.
- (for
sha1
) instead we simply locate the related file and check for coherence between the two signatures (the one that we stored along with the related file and the one that is being uploaded).- When a file is uploaded to GitLab (through Workhorse), the
sha1
is automatically computed and stored. That is why for maven packages, we ignore the digest uploads.
- When a file is uploaded to GitLab (through Workhorse), the
From this search, it seems that the coherence check fails with 409 Conflict
. We have about 300 hits per week on gitlab.com.
One thing to note from the search, this failure seems to happen randomly on file types. Sometimes is the sha1
for the jar
file, sometimes it's for the xml
file.
🚒 Solution
My gut feeling is telling me that we're hitting this issue because of the database replication lag. Maven clients upload the file and its sha1
digest in a row. Because the sha1
upload is ignored and simply read the related file (from the first upload), this will be routed to the replica. If the replica is lagging behind the primary database, the stored fingerprint could be wrong and we would hit conflict!
when comparing it with the received one.
Before diving in a fix for this (forcing reading from the primary when receiving a fingerprint upload), I'd like first to confirm that this is the root cause.
As such, I suggest:
- Given the number of occurences happening per week, add logging around here. Log:
- the received
sha1
. - the stored
sha1
. - the output of
hexdigest
the storedsha1
. - log only when there is a conflict.
- the received
- Let the logs run for a few days and analyze the situation.
- If the stored
sha1
is always wrong, this points to a replica lag issue.