Skip to content

Prevent `ArchiveTraceWorker` from holding client transactions open

What does this MR do?

Issue #322142 (closed)

At a high level this MR prevents transactions from being open for a long time while file storage completes. This has caused infra incidents in the past.

Technical Details

This MR moves file store to a remote file store outside of the carrierwave transaction by disabling the on save callback and enabling the after-commit callback that stores the file. It is better to run this after the transaction is closed so that we don't hold open the transaction. This is how current versions of carrierwave handle uploads, however we are behind.

Adds tests that the trace-related endpoints behave properly when a live trace exists along with a job artifact trace record with no file in the store.

Log Output

declared the following method within Ci::JobArtifact:

    def store_file!
      Gitlab::AppLogger.info('storing file')

      super

      Gitlab::AppLogger.info('done storing file')
    end

abbreviated logs:

  TRANSACTION (0.2ms)  BEGIN
  ...
  Ci::JobArtifact Create (2.9ms)  INSERT INTO "ci_job_artifacts" 
  ...
  Ci::JobArtifact Update (0.8ms)  UPDATE "ci_job_artifacts"
  ...
  TRANSACTION (1.0ms)  COMMIT

==> log/application.log <==
2021-08-13T20:10:21.804Z: storing file

==> log/application.log <==
2021-08-13T20:10:21.813Z: done storing file

logs:

==> log/development.log <==
  Ci::Build Load (2.7ms)  SELECT "ci_builds".* FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND (NOT EXISTS (SELECT 1 FROM "ci_job_artifacts" WHERE (ci_builds.id = ci_job_artifacts.job_id) AND "ci_job_artifacts"."file_type" = 3)) AND "ci_builds"."id" = 782 LIMIT 1 /*application:sidekiq,correlation_id:c9babb159d2c19672f304019be8bc5ca,jid:49d6080ec08db9303b5b6b87,endpoint_id:ArchiveTraceWorker,line:/app/workers/ci/archive_trace_worker.rb:14:in `perform'*/
  ↳ app/workers/ci/archive_trace_worker.rb:14:in `perform'
  Ci::BuildTraceChunk Exists? (0.8ms)  SELECT 1 AS one FROM "ci_build_trace_chunks" WHERE "ci_build_trace_chunks"."build_id" = 782 LIMIT 1 /*application:sidekiq,correlation_id:c9babb159d2c19672f304019be8bc5ca,jid:49d6080ec08db9303b5b6b87,endpoint_id:ArchiveTraceWorker,line:/lib/gitlab/ci/trace.rb:82:in `live_trace_exist?'*/
  ↳ lib/gitlab/ci/trace.rb:82:in `live_trace_exist?'
  Project Load (0.6ms)  SELECT "projects".* FROM "projects" WHERE "projects"."id" = 24 LIMIT 1 /*application:sidekiq,correlation_id:c9babb159d2c19672f304019be8bc5ca,jid:49d6080ec08db9303b5b6b87,endpoint_id:ArchiveTraceWorker,line:/lib/gitlab/ci/trace.rb:320:in `consistent_archived_trace?'*/
  ↳ lib/gitlab/ci/trace.rb:320:in `consistent_archived_trace?'
  Ci::JobArtifact Load (0.3ms)  SELECT "ci_job_artifacts".* FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."job_id" = 782 AND "ci_job_artifacts"."file_type" = 3 LIMIT 1 /*application:sidekiq,correlation_id:c9babb159d2c19672f304019be8bc5ca,jid:49d6080ec08db9303b5b6b87,endpoint_id:ArchiveTraceWorker,line:/lib/gitlab/ci/trace.rb:298:in `block in trace_artifact'*/
  ↳ lib/gitlab/ci/trace.rb:298:in `block in trace_artifact'
  CACHE Ci::BuildTraceChunk Exists? (0.0ms)  SELECT 1 AS one FROM "ci_build_trace_chunks" WHERE "ci_build_trace_chunks"."build_id" = 782 LIMIT 1
  ↳ lib/gitlab/ci/trace.rb:197:in `unsafe_archive!'
  TRANSACTION (0.2ms)  BEGIN /*application:sidekiq,correlation_id:c9babb159d2c19672f304019be8bc5ca,jid:49d6080ec08db9303b5b6b87,endpoint_id:ArchiveTraceWorker,line:/lib/gitlab/ci/trace.rb:254:in `block in create_build_trace!'*/
  ↳ lib/gitlab/ci/trace.rb:254:in `block in create_build_trace!'
  Ci::JobArtifact Create (2.9ms)  INSERT INTO "ci_job_artifacts" ("project_id", "file_type", "size", "created_at", "updated_at", "file", "file_sha256", "job_id") VALUES (24, 3, 1108, '2021-08-13 20:10:21.791631', '2021-08-13 20:10:21.791631', 'job.log', '\x64383037653336653337653063366130336365626637356565303462656639383835366434303565303764346432353435343361396231653836646637643261', 782) RETURNING "id" /*application:sidekiq,correlation_id:c9babb159d2c19672f304019be8bc5ca,jid:49d6080ec08db9303b5b6b87,endpoint_id:ArchiveTraceWorker,line:/lib/gitlab/ci/trace.rb:254:in `block in create_build_trace!'*/
  ↳ lib/gitlab/ci/trace.rb:254:in `block in create_build_trace!'
  Ci::JobArtifact Update (0.8ms)  UPDATE "ci_job_artifacts" SET "file_store" = 1 WHERE "ci_job_artifacts"."id" = 251 /*application:sidekiq,correlation_id:c9babb159d2c19672f304019be8bc5ca,jid:49d6080ec08db9303b5b6b87,endpoint_id:ArchiveTraceWorker,line:/app/models/concerns/file_store_mounter.rb:19:in `update_file_store'*/
  ↳ app/models/concerns/file_store_mounter.rb:19:in `update_file_store'
  TRANSACTION (1.0ms)  COMMIT /*application:sidekiq,correlation_id:c9babb159d2c19672f304019be8bc5ca,jid:49d6080ec08db9303b5b6b87,endpoint_id:ArchiveTraceWorker,line:/lib/gitlab/database.rb:235:in `commit'*/
  ↳ lib/gitlab/database.rb:235:in `commit'

==> log/application.log <==
2021-08-13T20:10:21.804Z: storing file

==> log/application_json.log <==
{"severity":"INFO","time":"2021-08-13T20:10:21.805Z","correlation_id":"c9babb159d2c19672f304019be8bc5ca","message":"storing file"}

==> log/application.log <==
2021-08-13T20:10:21.813Z: done storing file

==> log/application_json.log <==
{"severity":"INFO","time":"2021-08-13T20:10:21.813Z","correlation_id":"c9babb159d2c19672f304019be8bc5ca","message":"done storing file"}

Manual Testing

To test this manually you can start a new job and ensure that the trace for that eventually returns true for

job.trace.send('trace_artifact').archived_trace_exist? - this method takes into account weather the file is in storage.

You can also setup up minio object storage locally and verify that the file is properly located on the remote.

If you add a sleep or better a remote pry to store_file! you can also use that to investigate that none of the API endpoints or other trace-related features are broken during a long storage process. Some ways to inspect the file:

job = Ci::Build.find_by_id(<job_id>)
job.reload.trace.archived_trace_exist?
job.reload.trace.live_trace_exist?
job.trace_chunks.any?
job.trace.send('current_path').present? 
job.trace.send('old_trace').present?
artifact = job.trace.send('trace_artifact')
artifact.file_store

There are a few different settings that can be enabled and disabled for testing including direct_upload and background_upload. https://docs.gitlab.com/ee/administration/job_artifacts.html#object-storage-settings

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

To determine areas that may break with an artifact record but no corresponding file stored, I switched the trace artifact factory to remove the file and examined each spec that failed to determine what the behavior should be for an artifact with an unstored file.

Edited by Allison Browne

Merge request reports

Loading