AWS IAM: Ensure pre-signed URLs will last at least X minutes
Summary
When using AWS IAM authentication for S3 (Object Storage), the token used for authentication is a temporary one with a limited lifetime (could be low as 1h as an example): https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html
When GitLab uses the above type of authentication to generate pre-signed upload/download URLs with an expiration time of 1 day, the following caveat from https://docs.aws.amazon.com/AmazonS3/latest/userguide/ShareObjectPreSignedURL.html begins to apply:
If you created a presigned URL using a temporary token, then the URL expires when the token expires, even if the URL was created with a later expiration time.
This makes it possible for the URL provided to the requestor to be unusable upon arrival or when it is used/reused later (such as through the GitLab Pages' GitLab API cache which keeps reusing the URLs for upto 10 minutes)
Fog, a library GitLab uses for generating pre-signed URLs, has a mechanism within it to ensure that the IAM auth temporary token used is fresh, but its window is very narrow - it will keep using a temporary token as close as 15 seconds to its expiration: https://github.com/fog/fog-aws/blob/4c3c55b32a2e1e6b970caed468178fe39d3a0687/lib/fog/aws/credential_fetcher.rb#L126-L130
In the worst case, due to the above, a pre-signed URL may become unusable within 15 seconds of it being provided to any requestor.
The pre-signed URLs expiring forcibly due to use of IAM results in odd errors in downstream services such as the one described in issue gitlab-pages#686 (comment 807801966)
Steps to reproduce
Example Project
What is the current bug behavior?
Pre-signed URLs are not guaranteed to stay alive until their requested expiration time and can expire in as low as 15 seconds after issue.
What is the expected correct behavior?
Pre-signed URLs should be guaranteed to stay alive until at least a longer defined time period than 15 seconds.
For example, at least 15 minutes, to comfortably accommodate GitLab Pages' default of 10 minute URL cache/reuse.
Relevant logs and/or screenshots
See issue gitlab-pages#686 (comment 807801966) or customer ticket https://gitlab.zendesk.com/agent/tickets/256736
Output of checks
This was observed on instances running GitLab 14.4 and 14.5.
Possible fixes
One idea is to change the time threshold in Fog library upstream: https://github.com/fog/fog-aws/blob/master/lib/fog/aws/credential_fetcher.rb#L126-L130
Another is to track the expiration time ourselves and reload the token (if possible) in all its active users.