Prevent `ArchiveTraceWorker` from holding client transactions open
What does this MR do?
Issue #322142 (closed)
At a high level this MR prevents transactions from being open for a long time while file storage completes. This has caused infra incidents in the past.
Technical Details
This MR moves file store to a remote file store outside of the carrierwave transaction by disabling the on save callback and enabling the after-commit callback that stores the file. It is better to run this after the transaction is closed so that we don't hold open the transaction. This is how current versions of carrierwave handle uploads, however we are behind.
Adds tests that the trace-related endpoints behave properly when a live trace exists along with a job artifact trace record with no file in the store.
Log Output
declared the following method within Ci::JobArtifact
:
def store_file!
Gitlab::AppLogger.info('storing file')
super
Gitlab::AppLogger.info('done storing file')
end
abbreviated logs:
TRANSACTION (0.2ms) BEGIN
...
Ci::JobArtifact Create (2.9ms) INSERT INTO "ci_job_artifacts"
...
Ci::JobArtifact Update (0.8ms) UPDATE "ci_job_artifacts"
...
TRANSACTION (1.0ms) COMMIT
==> log/application.log <==
2021-08-13T20:10:21.804Z: storing file
==> log/application.log <==
2021-08-13T20:10:21.813Z: done storing file
logs:
==> log/development.log <==
Ci::Build Load (2.7ms) SELECT "ci_builds".* FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND (NOT EXISTS (SELECT 1 FROM "ci_job_artifacts" WHERE (ci_builds.id = ci_job_artifacts.job_id) AND "ci_job_artifacts"."file_type" = 3)) AND "ci_builds"."id" = 782 LIMIT 1 /*application:sidekiq,correlation_id:c9babb159d2c19672f304019be8bc5ca,jid:49d6080ec08db9303b5b6b87,endpoint_id:ArchiveTraceWorker,line:/app/workers/ci/archive_trace_worker.rb:14:in `perform'*/
↳ app/workers/ci/archive_trace_worker.rb:14:in `perform'
Ci::BuildTraceChunk Exists? (0.8ms) SELECT 1 AS one FROM "ci_build_trace_chunks" WHERE "ci_build_trace_chunks"."build_id" = 782 LIMIT 1 /*application:sidekiq,correlation_id:c9babb159d2c19672f304019be8bc5ca,jid:49d6080ec08db9303b5b6b87,endpoint_id:ArchiveTraceWorker,line:/lib/gitlab/ci/trace.rb:82:in `live_trace_exist?'*/
↳ lib/gitlab/ci/trace.rb:82:in `live_trace_exist?'
Project Load (0.6ms) SELECT "projects".* FROM "projects" WHERE "projects"."id" = 24 LIMIT 1 /*application:sidekiq,correlation_id:c9babb159d2c19672f304019be8bc5ca,jid:49d6080ec08db9303b5b6b87,endpoint_id:ArchiveTraceWorker,line:/lib/gitlab/ci/trace.rb:320:in `consistent_archived_trace?'*/
↳ lib/gitlab/ci/trace.rb:320:in `consistent_archived_trace?'
Ci::JobArtifact Load (0.3ms) SELECT "ci_job_artifacts".* FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."job_id" = 782 AND "ci_job_artifacts"."file_type" = 3 LIMIT 1 /*application:sidekiq,correlation_id:c9babb159d2c19672f304019be8bc5ca,jid:49d6080ec08db9303b5b6b87,endpoint_id:ArchiveTraceWorker,line:/lib/gitlab/ci/trace.rb:298:in `block in trace_artifact'*/
↳ lib/gitlab/ci/trace.rb:298:in `block in trace_artifact'
CACHE Ci::BuildTraceChunk Exists? (0.0ms) SELECT 1 AS one FROM "ci_build_trace_chunks" WHERE "ci_build_trace_chunks"."build_id" = 782 LIMIT 1
↳ lib/gitlab/ci/trace.rb:197:in `unsafe_archive!'
TRANSACTION (0.2ms) BEGIN /*application:sidekiq,correlation_id:c9babb159d2c19672f304019be8bc5ca,jid:49d6080ec08db9303b5b6b87,endpoint_id:ArchiveTraceWorker,line:/lib/gitlab/ci/trace.rb:254:in `block in create_build_trace!'*/
↳ lib/gitlab/ci/trace.rb:254:in `block in create_build_trace!'
Ci::JobArtifact Create (2.9ms) INSERT INTO "ci_job_artifacts" ("project_id", "file_type", "size", "created_at", "updated_at", "file", "file_sha256", "job_id") VALUES (24, 3, 1108, '2021-08-13 20:10:21.791631', '2021-08-13 20:10:21.791631', 'job.log', '\x64383037653336653337653063366130336365626637356565303462656639383835366434303565303764346432353435343361396231653836646637643261', 782) RETURNING "id" /*application:sidekiq,correlation_id:c9babb159d2c19672f304019be8bc5ca,jid:49d6080ec08db9303b5b6b87,endpoint_id:ArchiveTraceWorker,line:/lib/gitlab/ci/trace.rb:254:in `block in create_build_trace!'*/
↳ lib/gitlab/ci/trace.rb:254:in `block in create_build_trace!'
Ci::JobArtifact Update (0.8ms) UPDATE "ci_job_artifacts" SET "file_store" = 1 WHERE "ci_job_artifacts"."id" = 251 /*application:sidekiq,correlation_id:c9babb159d2c19672f304019be8bc5ca,jid:49d6080ec08db9303b5b6b87,endpoint_id:ArchiveTraceWorker,line:/app/models/concerns/file_store_mounter.rb:19:in `update_file_store'*/
↳ app/models/concerns/file_store_mounter.rb:19:in `update_file_store'
TRANSACTION (1.0ms) COMMIT /*application:sidekiq,correlation_id:c9babb159d2c19672f304019be8bc5ca,jid:49d6080ec08db9303b5b6b87,endpoint_id:ArchiveTraceWorker,line:/lib/gitlab/database.rb:235:in `commit'*/
↳ lib/gitlab/database.rb:235:in `commit'
==> log/application.log <==
2021-08-13T20:10:21.804Z: storing file
==> log/application_json.log <==
{"severity":"INFO","time":"2021-08-13T20:10:21.805Z","correlation_id":"c9babb159d2c19672f304019be8bc5ca","message":"storing file"}
==> log/application.log <==
2021-08-13T20:10:21.813Z: done storing file
==> log/application_json.log <==
{"severity":"INFO","time":"2021-08-13T20:10:21.813Z","correlation_id":"c9babb159d2c19672f304019be8bc5ca","message":"done storing file"}
Manual Testing
To test this manually you can start a new job and ensure that the trace for that eventually returns true for
job.trace.send('trace_artifact').archived_trace_exist?
- this method takes into account weather the file is in storage.
You can also setup up minio object storage locally and verify that the file is properly located on the remote.
If you add a sleep
or better a remote pry to store_file!
you can also use that to investigate that none of the API endpoints or other trace-related features are broken during a long storage process. Some ways to inspect the file:
job = Ci::Build.find_by_id(<job_id>)
job.reload.trace.archived_trace_exist?
job.reload.trace.live_trace_exist?
job.trace_chunks.any?
job.trace.send('current_path').present?
job.trace.send('old_trace').present?
artifact = job.trace.send('trace_artifact')
artifact.file_store
There are a few different settings that can be enabled and disabled for testing including direct_upload
and background_upload
. https://docs.gitlab.com/ee/administration/job_artifacts.html#object-storage-settings
Does this MR meet the acceptance criteria?
Conformity
-
I have included changelog trailers, or none are needed. (Does this MR need a changelog?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides.
Availability and Testing
To determine areas that may break with an artifact record but no corresponding file stored, I switched the trace artifact factory to remove the file and examined each spec that failed to determine what the behavior should be for an artifact with an unstored file.