When rotating token delay revocation of old token until new token is used and/or a period of time has elapsed
Proposal
Rotating a token via the API requires the new value returned (once and only once) by the API to be safely recorded so that all places where the token is used can be updated with the new value. After rotating a token until all references have been updated to the new value any tasks using the old value will fail.
The rotation of tokens can be automated by scripting the use of the rotate a personal access token API with the current token value which avoids the need for a separate privileged user PAT to be used. However, if for some reason the new token value fails to be recorded (e.g. a connection drop after the API request has been made but before the response is received, or a script error) there is no way to then obtain the value, and manual intervention by a user would be required to create a new token and bot account. As the old token is revoked immediately until the manual intervention is performed any tasks using the old token value will fail.
This issue suggests a mechanism be implemented to mitigate the disk of disruption by allowing the old token value to continue to be used after being rotated, until
- the new token value has been used once (indicating the new value has been successfully recorded), and/or
- a set period of time has elapsed (allowing time to update all references to the token to the new value)
If both options were available on an instance/namespace wide level customers could chose how to balance the security benefits of immediate token rotation/revocation against mitigating the risk of disruption due to a "failure" of an automated token rotation process.
This issue arose from discussions with a GitLab Premium ~SaaS customer in a support ticket (ZD internal link).