Add artifact/cache upload progress meter
What does this MR do?
Adds a progress meter for cache and artifact uploads.
This isn't displayed by default and is intended to be used for diagnostics.
Specifying a TRANSFER_METER_FREQUENCY
controls the rate at which an update is provided. The default is zero which means no progress is displayed (no change to the job trace). <1s is set to 1 second, anything above that is allowed. It uses Go's ParseDuration
so 1s
, 1m20s
are all allowed.
This MR is split into two commits:
UpdateRawArtifacts
to accept an io.ReadCloser instead of just a io.Reader
1st commit: Updates There's two reasons for this but they might be controversial.
- Now that both the meter and
UpdateRawArtifacts
are writing to stderr, the very last status update fromUpdateRawArtifacts
can mangle the very last output ofmeter
, due to the way themeter
progress is flushed onClose()
. IfUpdateRawArtifacts
closes on our behalf, we can ensure the meter is flushed first and then the log line fromUpdateRawArtifacts
doesn't mangle. - With
UpdateRawArtifacts
now handling the closing of the provided reader, we can check to see if there's any error. Usually, we could safely discard any errors on the close of a reader, but I think this might have value because in some situations we're using anio.Pipe
to anotherio.Pipe
, where an error on one side is designed to show up on the other.
2nd commit:
Adds the meter and associated tests to the commands helper package. Uses a global variable set from TRANSFER_METER_FREQUENCY
environment variable to control update frequency. Although the update frequency of our live logging is set to 5 seconds, setting this to something lower is helpful because it's still visible in the raw log output.
Why was this MR needed?
I'm hoping this will help us diagnose certain artifact upload problems, some related to speed, and others related to just how far in the process did an upload get before failing.
For example, #26868 is an issue where artifacts are silently not uploaded and the job succeeds. We don't know whether the error is immediate or at the very end of uploading.
What's the best way to test this MR?
Setting RUNNER_METER_FREQ: "1s"
in the gitlab.ci.yml
. The output uses \r
to update the progress in-place. Viewing the raw log should display the progress across multiple lines.
To test cache uploads, the following can be done locally to start an s3 server:
# Create fake bucket directory
mkdir -p minio/testing
# Run minio
docker run --name minio -d -v $(pwd)/minio:/data -p 9000:9000 -e "MINIO_ROOT_USER=AKIAIOSFODNN7EXAMPLE" -e "MINIO_ROOT_PASSWORD=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" minio/minio:edge server /data
Update runner's config toml to include:
[runners.cache]
type = "s3"
[runners.cache.s3]
ServerAddress = "<PUBLIC IP>:9000"
Insecure = true
AccessKey = "AKIAIOSFODNN7EXAMPLE"
SecretKey = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
BucketName = "testing"
Where PUBLIC IP is an IP address that the job inside of a docker container would be able to externally access.
Test with the following pipeline:
image: busybox:latest
variables:
TRANSFER_METER_FREQUENCY: "1s"
upload:
stage: test
cache:
key: ${CI_COMMIT_REF_SLUG}
policy: pull-push
paths:
- random
script:
- dd if=/dev/urandom of=random bs=1M count=100
artifacts:
paths:
- random
What are the relevant issue numbers?
closes #27429 (closed)