Add artifact/cache upload progress meter (!2670) · Merge requests · GitLab.org / gitlab-runner

Arran Walker requested to merge ajwalker/uploader-progress into master Jan 11, 2021

What does this MR do?

Adds a progress meter for cache and artifact uploads.

This isn't displayed by default and is intended to be used for diagnostics.

Specifying a TRANSFER_METER_FREQUENCY controls the rate at which an update is provided. The default is zero which means no progress is displayed (no change to the job trace). <1s is set to 1 second, anything above that is allowed. It uses Go's ParseDuration so 1s, 1m20s are all allowed.

This MR is split into two commits:

1st commit: Updates `UpdateRawArtifacts` to accept an io.ReadCloser instead of just a `io.Reader`

There's two reasons for this but they might be controversial.

Now that both the meter and UpdateRawArtifacts are writing to stderr, the very last status update from UpdateRawArtifacts can mangle the very last output of meter, due to the way the meter progress is flushed on Close(). If UpdateRawArtifacts closes on our behalf, we can ensure the meter is flushed first and then the log line from UpdateRawArtifacts doesn't mangle.
With UpdateRawArtifacts now handling the closing of the provided reader, we can check to see if there's any error. Usually, we could safely discard any errors on the close of a reader, but I think this might have value because in some situations we're using an io.Pipe to another io.Pipe, where an error on one side is designed to show up on the other.

2nd commit:

Adds the meter and associated tests to the commands helper package. Uses a global variable set from TRANSFER_METER_FREQUENCY environment variable to control update frequency. Although the update frequency of our live logging is set to 5 seconds, setting this to something lower is helpful because it's still visible in the raw log output.

Why was this MR needed?

I'm hoping this will help us diagnose certain artifact upload problems, some related to speed, and others related to just how far in the process did an upload get before failing.

For example, #26868 is an issue where artifacts are silently not uploaded and the job succeeds. We don't know whether the error is immediate or at the very end of uploading.

What's the best way to test this MR?

Setting RUNNER_METER_FREQ: "1s" in the gitlab.ci.yml. The output uses \r to update the progress in-place. Viewing the raw log should display the progress across multiple lines.

To test cache uploads, the following can be done locally to start an s3 server:

# Create fake bucket directory
mkdir -p minio/testing

# Run minio
docker run --name minio -d -v $(pwd)/minio:/data -p 9000:9000 -e "MINIO_ROOT_USER=AKIAIOSFODNN7EXAMPLE" -e "MINIO_ROOT_PASSWORD=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" minio/minio:edge server /data

Update runner's config toml to include:

  [runners.cache]
    type = "s3"
    [runners.cache.s3]
      ServerAddress = "<PUBLIC IP>:9000"
      Insecure = true
      AccessKey = "AKIAIOSFODNN7EXAMPLE"
      SecretKey = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
      BucketName = "testing"

Where PUBLIC IP is an IP address that the job inside of a docker container would be able to externally access.

Test with the following pipeline:

image: busybox:latest
variables:
  TRANSFER_METER_FREQUENCY: "1s"

upload:
  stage: test
  cache:
    key: ${CI_COMMIT_REF_SLUG}
    policy: pull-push
    paths:
      - random
  script:
    - dd if=/dev/urandom of=random bs=1M count=100
  artifacts:
    paths:
      - random

What are the relevant issue numbers?

closes #27429 (closed)

#1320 (closed)

Edited Nov 11, 2024 by Arran Walker

Add artifact/cache upload progress meter