Optimize the garbage collector sweep stage for S3
We're currently looking at optimizing the Container Registry garbage collection algorithm for S3 (gitlab#31071 (closed)).
In this issue we'll investigate possible improvements for the sweep stage.
Current Behaviour
The GC sweep stage starts by looking for a list of manifests and blobs that should be deleted. The manifests list is built during the mark stage (recursive path walk), and the blob list is made similarly, but during the sweep stage.
With the list of manifests and blobs to delete, the GC uses storage.Vacuum
as an intermediate for the storage driver, removing objects from the backend.
Note: Optimizations around the path walk are being investigated in #3 (closed), so they're out of the scope of this issue. Here we focus exclusively on the delete operation.
Blobs
- The GC, in
storage.MarkAndSweep
, iterates over the list of blobs to delete and callsstorage.Vacuum.RemoveBlob
for each one, passing the blob digest (e.g.sha256:f13f9a179e1c60cfe6b7a8b8b983443f43d9c49f81e1e449617234bec10407e6
) as argument:- The
storage.Vacuum.RemoveBlob
method parses the blob digest, generates its full path (e.g.docker/registry/v2/blobs/sha256/f1/f13f9a179e1c60cfe6b7a8b8b983443f43d9c49f81e1e449617234bec10407e6
) and invokes the storage driverDelete
method:- The
storage.driver.s3.Delete
method will then do:- Send a
ListObjects
request to S3 to obtain the full list of all objects under a blob’s path. In this case, a blob’s path has a single object within, thedata
file. - With the list of objects in hand (a single
data
file), an S3DeleteObjects
request is built and sent. During the request, S3 will automatically delete the parent folderf13f9a179e1c60cfe6b7a8b8b983443f43d9c49f81e1e449617234bec10407e6
, and its parentf1
if empty.
- Send a
- The
- The
The process described above is synchronous and sequential. Thus, for each blob to delete, we send two HTTP requests to S3 (list and delete) and wait for the response before moving to the next one. So we have 2N
network requests, where N
is the number of blobs to delete.
Bottlenecks and Possible Solutions
- The
storage.driver.s3.Delete
method is generic, i.e., it's designed to work with all kinds of objects. That's why aListObjects
request always precedes aDeleteObjects
request. For blob deletion, we don't need theListObjects
, we could delete thedata
file directly. This would cut the required network requests in half. - We're not making use of the bulk capabilities of the S3
DeleteObjects
operation when deleting blobs. It supports up to1000
objects per request (source), which means that we could potentially send a single request instead of2N
in some scenarios. Again, thestorage.driver.s3.Delete
method is generic. In the blob deletion scenario, it currently does aDeleteObjects
request with a single element (data
file) per blob. - If using bulk requests isn't viable or enough, we can parallelize the calls to
storage.Vacuum.RemoveBlob
instorage.MarkAndSweep
, spawning a goroutine per blob deletion. - This is minor, but in
storage.MarkAndSweep
a blob's digest is converted to a string just to be passed as an argument tostorage.Vacuum.RemoveBlob
, where it's then parsed back to a digest. Thestorage.Vacuum.RemoveBlob
method is only used in one more place, theproxy.NewRegistryPullThroughCache
function, where the same thing happens (digest converted to a string). So we can avoid these conversions.
The improvements related to blob deletion are being implemented in !31 (merged).
Manifests
- In
storage.MarkAndSweep
, the garbage collector iterates over the list of manifests to delete and callsstorage.Vacuum.RemoveManifest
for each one, passing the manifest name, digest and tag references as argument. This method will then do two things:- Iterate over the tag references list and:
- Generate the tag index directory full path (e.g.
docker/registry/v2/repositories/myorg/myproj/myimg/_manifests/tags/index/sha256/82d...057
) - Send a
ListObjects
request to S3 to check if the tag index file exists (stat). If so, the storage driverDelete
method is invoked to delete the tag reference. Thestorage.driver.s3.Delete
method will then do:- Send a
ListObjects
request to S3 to obtain the list of objects under a tag index full path (secondListObjects
request for the same object). There is only one, the tag index link file (e.g.82d...057/link
). - With the list of objects in hand (only one), an S3
DeleteObjects
request is built and sent.
- Send a
- Generate the tag index directory full path (e.g.
- Build the manifest revision directory full path (e.g.
docker/registry/v2/repositories/myorg/myproj/myimg/_manifests/revisions/sha256/82d...057
)- Invoke the storage driver
Delete
method to delete the manifest revision. Thestorage.driver.s3.Delete
method will then do:- Send a
ListObjects
request to S3 to obtain the list of objects under a manifest revision full path. There is only one, the manifest revision link file (e.g.82d...057/link
). - With the list of objects in hand (only one), an S3
DeleteObjects
request is built and sent.
- Send a
- Invoke the storage driver
- Iterate over the tag references list and:
The process described above is synchronous and sequential.
For each manifest to delete, the GC always sends at least two HTTP requests to S3 (ListObjects
and DeleteObjects
). That's 2N
network requests, where N
is the number of manifests to delete. On top of that, we can have another 2N
requests (ListObjects
and DeleteObjects
) being made to delete tag index files (if any), where N
is the number of tag index files for each manifest.
Bottlenecks and Possible Solutions
- The bottlenecks identified for blobs also apply to manifests.
- We can potentially bundle the deletion of manifest revision and tag index files in a single
DeleteObjects
request.
The improvements related to manifest deletion are being implemented in !39 (merged).