Augment GCS signed URLs with GitLab metadata for package registry
Context
The container registry's epic on the same topic greatly explained how we can achieve the instrumentation of the data transfer.
However, unlike the container registry, the package registry doesn't use Cloud CDN on gitlab.com. It uses signed URLs/redirects for GCS.
So, according to the instrumentation blueprint, the package registry needs to send some metadata to GCS when a package file is downloaded.
Those metadata are:
- the package file's root namespace id
- the package file's project id (if any)
- the package file's size
When a package file is downloaded, GCS will include those metadata in its logs for the download request. Those logs are aggregated and processed to get the data transfer usage statistics.
Implementation
When a package file is requested for download, a carrierwave's method named url
is called to generate the signed URL. To be able to append the metadata to the download URL, we have to override the url
method and append whatever we want to the URL and then call its super
method.
What this MR does?
- Create a module named
Packages::GcsSignedUrlMetadata
. This module has the logic of overriding theurl
method and append the needed metadata. - Include
Packages::GcsSignedUrlMetadata
module in each package file uploader. The uploader is a class that inherits fromCarrierWave::Uploader::Base
. So it's the place where theurl
method is being called. Including thePackages::GcsSignedUrlMetadata
module in the uploader allow us to override theurl
method. - Modify the underlying model of each uploader to make sure it implements the three needed metadata:
project_id
root_namespace
size
- Add the needed specs.
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
Testing this feature requires using Google Cloud Storage as the Object Storage.
- Create a GCS new project or use your existing one if any.
- Create a bucket in your GCS project and make sure to handle the needed permissions in order to have access to the audit logs (I can help in setting this up)
- Create a service account on GCS and download its credentials JSON file (Needed to connect GDK to GCS).
- Configure your GDK to use your GCS as the Object Storage:
- in your
gitlab.yml
, update thepackages
section as follows:## Packages (maven repository, npm registry, etc...) packages: enabled: true dpkg_deb_path: /opt/homebrew/bin/dpkg-deb object_store: enabled: true remote_directory: <name of gcs bucket> direct_upload: true connection: provider: 'Google' google_project: '<your gcs project id>' google_json_key_location: '<path to your gcs service account json file>'
- Restart your GDK.
- in your
- In rails console, create a package that we can test with:
# stub file upload
def fixture_file_upload(*args, **kwargs)
Rack::Test::UploadedFile.new(*args, **kwargs)
end
FactoryBot.create(:generic_package)
- Download the package from its UI page.
- On your GCS project, check out the logs of your service account:
IAM & Admin
=>Service Accounts
=> Click on your service account =>LOGS
tab - You should find the requests done on the bucket logged. The latest log entry should be the package file download request log:
storage.objects.get
. In the log entry details, the metadata we send in the signed URL should be present:
{
"protoPayload": {
"@type": "type.googleapis.com/google.cloud.audit.AuditLog",
"status": {},
"authenticationInfo": {
"principalEmail": "XXXX@XXX.iam.gserviceaccount.com"
},
"requestMetadata": {
"callerIp": "XXXX",
"requestAttributes": {
"time": "2024-03-18T18:40:55.423539123Z",
"auth": {}
},
"destinationAttributes": {}
},
"serviceName": "storage.googleapis.com",
"methodName": "storage.objects.get",
"authorizationInfo": [
{
"resource": "path_to_file/ananas.txt",
"permission": "storage.objects.get",
"granted": true,
"resourceAttributes": {}
}
],
"resourceName": "path_to_file/ananas.txt",
"metadata": {
"audit_context": {
"app_context": "EXTERNAL",
"audit_info": {
"x-goog-custom-audit-gitlab-size-bytes": "10",
"x-goog-custom-audit-gitlab-namespace": "24",
"x-goog-custom-audit-gitlab-project": "2"
}
}
},
"resourceLocation": {
"currentLocations": [
"eu"
]
}
},
"insertId": "XXXX",
"resource": {
"type": "gcs_bucket",
"labels": {
"location": "eu",
"bucket_name": "XXXXX",
"project_id": "XXXXX"
}
},
"timestamp": "2024-03-18T18:40:55.414995195Z",
"severity": "INFO",
"logName": "projects/XXXX/logs/cloudaudit.googleapis.com%2Fdata_access",
"receiveTimestamp": "2024-03-18T18:40:56.685214421Z"
}
As we can see in the log entry, the metadata are present:
"x-goog-custom-audit-gitlab-size-bytes": "10",
"x-goog-custom-audit-gitlab-namespace": "24",
"x-goog-custom-audit-gitlab-project": "2"
Related to #443335