Skip to content

Use CommitStatus scopes to explicitly destroy records instead of nullifying the foreign key

drew stachon requested to merge remove-bad-join-from-destroy-ci-records into master

What does this MR do and why?

This MR breaks the project_id querying out of CommitStatus.for_project_paths so it can be reused with a specific project id.

We do this so we can send a DELETE query directly to every record the ci_builds table, unconstrained by CommitStatus type.

Currently:

pry(main)> project.commit_statuses.delete_all
  CommitStatus Update All (0.5ms)  UPDATE "ci_builds" SET "project_id" = NULL WHERE "ci_builds"."project_id" = 1

With this change:

pry(main)> CommitStatus.where(project_id: project.id).delete_all
  CommitStatus Destroy (0.5ms)  DELETE FROM "ci_builds" WHERE "ci_builds"."project_id" = 1

When destroying CI records, we need to fully ensure that all records in ci_builds get destroyed before the project record goes. Right now, we are just setting the project_id column of ci_builds to null.

This is because Project currently does not specify a deletion strategy for the relation:

  has_many :commit_statuses

and so the default :nullify strategy is used.

This is different than the builds relation, which has a specified strategy despite pointing to the same table:

has_many :builds, class_name: 'Ci::Build', inverse_of: :project, dependent: :destroy

So when we see the Projects::DestroyService reporting leftover commit statuses, those commit status records are not being destroyed, only having their project_id set to null.

How to set up and validate locally

1. Initial Counts

[13] pry(main)> CommitStatus.count
   (2.6ms)  SELECT COUNT(*) FROM "ci_builds"
=> 663

[10] pry(main)> project = Project.find(4)
  Project Load (1.0ms)  SELECT ...
  Route Load (0.3ms)  SELECT ...
=> #<Project id:4 Commit451/lab-coat>>

[14] pry(main)> project.commit_statuses.count
   (0.7ms)  SELECT COUNT(*) FROM "ci_builds" WHERE "ci_builds"."project_id" = 4
=> 68

[11] pry(main)> cs_ids = project.commit_statuses.pluck(:id)
   (0.5ms)  SELECT "ci_builds"."id" FROM "ci_builds" WHERE "ci_builds"."project_id" = 4
=> [...]

[12] pry(main)> cs_ids.count
=> 68

2. Attempt delete_all

[15] pry(main)> project.commit_statuses.delete_all
  CommitStatus Update All (21.3ms)  UPDATE "ci_builds" SET "project_id" = NULL WHERE "ci_builds"."project_id" = 4 
=> 68

3. Observe the same total counts, but the relation count gone to zero

[16] pry(main)> CommitStatus.count
   (0.6ms)  SELECT COUNT(*) FROM "ci_builds"
=> 663

[18] pry(main)> project.commit_statuses.count
   (0.5ms)  SELECT COUNT(*) FROM "ci_builds" WHERE "ci_builds"."project_id" = 4
=> 0

[17] pry(main)> CommitStatus.where(project_id: nil).count
   (0.6ms)  SELECT COUNT(*) FROM "ci_builds" WHERE "ci_builds"."project_id" IS NULL
=> 68

4. The records themselves still exist

[19] pry(main)> CommitStatus.where(id: cs_ids).count
   (1.1ms)  SELECT COUNT(*) FROM "ci_builds" WHERE "ci_builds"."id" IN (...) 
=> 68

5. Reset the project_id on the abandoned CommitStatus records

[36] pry(main)> CommitStatus.where(project_id: nil).update_all(project_id: project.id)
  CommitStatus Update All (8.2ms)  UPDATE "ci_builds" SET "project_id" = 4 WHERE "ci_builds"."project_id" IS NULL
=> 68

[37] pry(main)> CommitStatus.where(project_id: nil).count
   (0.6ms)  SELECT COUNT(*) FROM "ci_builds" WHERE "ci_builds"."project_id" IS NULL
=> 0

6. Observe the same original total

[40] pry(main)> CommitStatus.count
   (0.6ms)  SELECT COUNT(*) FROM "ci_builds"
=> 663

[42] pry(main)> project.commit_statuses.count
   (1.6ms)  SELECT COUNT(*) FROM "ci_builds" WHERE "ci_builds"."project_id" = 4
=> 68

7. Use new delete_all statement

[41] pry(main)> CommitStatus.for_project_id(project.id).delete_all
  CommitStatus Destroy (84.1ms)  DELETE FROM "ci_builds" WHERE "ci_builds"."project_id" = 4
=> 68

8. Observe Updated totals

[44] pry(main)> CommitStatus.count
   (1.8ms)  SELECT COUNT(*) FROM "ci_builds"
=> 595

[45] pry(main)> project.commit_statuses.count
   (0.7ms)  SELECT COUNT(*) FROM "ci_builds" WHERE "ci_builds"."project_id" = 4
=> 0

[46] pry(main)> CommitStatus.where(project_id: nil).count
   (0.5ms)  SELECT COUNT(*) FROM "ci_builds" WHERE "ci_builds"."project_id" IS NULL
=> 0

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Follow-up

After this is running in production, we should do a scan of the ci_builds table (I know, I know) and destroy all the builds with a blank project_id.

I say "destroy" specifically, because we might have to instantiate the build record itself and call ActiveRecord::Base#destroy on it, in order to trigger the associated callbacks that destroy Ci::JobArtifact records. Ci::PipelineDestroyService is supposed to take care of Ci::JobArtifacts, so this might not be necessary, but I don't want to forget to think about it.

Right now the ON CASACADE DELETE has taken care of us, so we don't need to go back and do any cleanup. But we'll have needed to do this in order to complete #340256 (closed)

Edited by drew stachon

Merge request reports

Loading