Offline Garbage Collections fail on filesystems with none images pushed
Context
As concluded with @bprescott_ we believe gitlab-ctl registry-garbage-collect
could get some love
omnibus-gitlab#2811 (comment 1038932875)
Running it with no images pushed it fails after unnecessary registry shutdown for offline garbage collection:
failed to garbage collect: marking blobs: : Path not found: /docker/registry/v2/repositories
Ideally we should have a positive exit code after check even if no garbage collection is needed in such cases.
This would help also the documented best-practice of a weekly cron job to trigger offline garbage collection:
https://docs.gitlab.com/ee/administration/packages/container_registry.html#running-the-garbage-collection-on-schedule
Implementation Guide
In the offline garbage collector, before starting to enumerate repositories, check if the repositories root path exists. If the path does not exist, exit early with a log message saying that there are no repositories.
The storage drivers for object storage make use of path specs found in paths.go
.
In the issue reported above, we see that the storage drive is not able to find the /docker/registry/v2/repositories
path, this corresponds to the repositoriesRootPathSpec
To check for the existence of this path, we'll need to do the following at the beginning of the MarkAndSweep
:
- Convert the
repositoriesRootPathSpec
to a string that the storage driver can use via thepathFor
function. - Pass that string to the
Stat
method of the storage driver. - Check if the error is not
nil
and if so, if that error is aPathNotFoundError
- If the error is a
PathNotFoundError
we can log a message indicating that garbage collection was skipped and exit the function early with no error. - If the error is not
nil
, but is also not aPathNotFoundError
we should return the error with context - Finally, if the error is
nil
, we can continueMarkAndSweep
as normal.
Testing
For testing, we need to create a new storage driver and a registry.
Such as in the first few lines of TestGarbageCollectAfterLastTagRemoved
.
Afterwards, we should run MarkAndSweep and ensure no error is returned from that function.
See the end of TestNoDeletionNoEffect
, minus the last two lines for an example of running this in a test.