Optimize counts.terraform_reports usage ping counter
Resubmitting !37498 (merged), which was reverted due to the migration taking too long on staging (gitlab-com/gl-infra/production#2514 (closed)).
The content of this MR is identical to !37498 (merged). No changes should be required, since the root cause of the issue was replication (gitlab-com/gl-infra/production#2514 (comment 395074719)).
What does this MR do?
In #230437 (closed), we noticed that the counts.terraform_reports
usage ping counter was failing on gitlab.com. This MR optimizes the counter to get it working again.
This counter currently uses batching, so it queries for the MIN
/MAX
id
, and then performs a query for each batch in that range. The batch size defaults to 100_000
, and gets cut in half every time the batch query times out (ActiveRecord::QueryCanceled
). The batch size can get as small as 1_250
.
[6] pry(main)> Gitlab::Utils::UsageData.count(::Ci::JobArtifact.terraform_reports)
(11.9ms) SELECT MIN("ci_job_artifacts"."id") FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" = 18
(0.5ms) SELECT MAX("ci_job_artifacts"."id") FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" = 18
(3.0ms) SELECT COUNT("ci_job_artifacts"."id") FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" = 18 AND "ci_job_artifacts"."id" BETWEEN 0 AND 99999
The rows being counted are covered by an existing partial index:
CREATE INDEX index_ci_job_artifacts_for_terraform_reports ON public.ci_job_artifacts USING btree (project_id, id) WHERE (file_type = 18);
Queries were tested in database lab, and their performance is examined below.
Current query performance
SELECT MIN("ci_job_artifacts"."id") FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" = 18
Time: > 1hr (cancelled because it took very long)
https://explain.depesz.com/s/lGxT (without execution)
Note: Terraform reports are a relatively new feature, introduced in %13.0. When starting from the lowest id
, it is likely that a large amount of the table would need to be read before finding any records matching WHERE "ci_job_artifacts"."file_type" = 18
.
SELECT MAX("ci_job_artifacts"."id") FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" = 18
Time: 209.649 ms
https://explain.depesz.com/s/8nVZ
SELECT COUNT("ci_job_artifacts"."id") FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" = 18 AND "ci_job_artifacts"."id" BETWEEN 0 AND 99999
Time: 303.937 ms
https://explain.depesz.com/s/rn7xE
Note: Terraform reports are a relatively new feature, introduced in %13.0. It's highly likely that this query did not find any records matching WHERE "ci_job_artifacts"."file_type" = 18
, since it's looking at the beginning of the table. I'm not sure if/how the performance is affected by the number of matching records.
Proposal
This MR adds a new partial index to optimize the queries above:
CREATE INDEX index_ci_job_artifacts_id_for_terraform_reports ON public.ci_job_artifacts USING btree (id) WHERE (file_type = 18)
Query performance with index
SELECT MIN("ci_job_artifacts"."id") FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" = 18
Time: 0.633 ms
https://explain.depesz.com/s/39AB
SELECT MAX("ci_job_artifacts"."id") FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" = 18
Time: 0.611 ms
https://explain.depesz.com/s/GCv9
SELECT COUNT("ci_job_artifacts"."id") FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" = 18 AND "ci_job_artifacts"."id" BETWEEN 0 AND 99999
Time: 0.290 ms
https://explain.depesz.com/s/3UiQ
Note: No matching records in this batch.
Migration output
== 20200806202645 AddIndexToCiJobArtifactsForTerraformReportsId: migrating ====
-- transaction_open?()
-> 0.0000s
-- index_exists?(:ci_job_artifacts, :id, {:where=>"file_type = 18", :name=>"index_ci_job_artifacts_id_for_terraform_reports", :algorithm=>:concurrently})
-> 0.0032s
-- add_index(:ci_job_artifacts, :id, {:where=>"file_type = 18", :name=>"index_ci_job_artifacts_id_for_terraform_reports", :algorithm=>:concurrently})
-> 0.0036s
== 20200806202645 AddIndexToCiJobArtifactsForTerraformReportsId: migrated (0.0071s)
Index creation took 29.630 min
in database lab.
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry - [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides - [-] Separation of EE specific content
Availability and Testing
- [-] Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
- [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done