Optimize counts.terraform_reports usage ping counter (!39499) · Merge requests · GitLab.org / GitLab

Alishan Ladhani requested to merge 230437-usage-data-optimize-counts-terraform_reports into master Aug 13, 2020

Resubmitting !37498 (merged), which was reverted due to the migration taking too long on staging (gitlab-com/gl-infra/production#2514 (closed)).

The content of this MR is identical to !37498 (merged). No changes should be required, since the root cause of the issue was replication (gitlab-com/gl-infra/production#2514 (comment 395074719)).

What does this MR do?

In #230437 (closed), we noticed that the counts.terraform_reports usage ping counter was failing on gitlab.com. This MR optimizes the counter to get it working again.

This counter currently uses batching, so it queries for the MIN/MAX id, and then performs a query for each batch in that range. The batch size defaults to 100_000, and gets cut in half every time the batch query times out (ActiveRecord::QueryCanceled). The batch size can get as small as 1_250.

[6] pry(main)> Gitlab::Utils::UsageData.count(::Ci::JobArtifact.terraform_reports)
   (11.9ms)  SELECT MIN("ci_job_artifacts"."id") FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" = 18
   (0.5ms)  SELECT MAX("ci_job_artifacts"."id") FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" = 18
   (3.0ms)  SELECT COUNT("ci_job_artifacts"."id") FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" = 18 AND "ci_job_artifacts"."id" BETWEEN 0 AND 99999

The rows being counted are covered by an existing partial index:

CREATE INDEX index_ci_job_artifacts_for_terraform_reports ON public.ci_job_artifacts USING btree (project_id, id) WHERE (file_type = 18);

Queries were tested in database lab, and their performance is examined below.

Current query performance

SELECT MIN("ci_job_artifacts"."id") FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" = 18

Time: > 1hr (cancelled because it took very long)

https://explain.depesz.com/s/lGxT (without execution)

Note: Terraform reports are a relatively new feature, introduced in %13.0. When starting from the lowest id, it is likely that a large amount of the table would need to be read before finding any records matching WHERE "ci_job_artifacts"."file_type" = 18.

SELECT MAX("ci_job_artifacts"."id") FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" = 18

Time: 209.649 ms

https://explain.depesz.com/s/8nVZ

SELECT COUNT("ci_job_artifacts"."id") FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" = 18 AND "ci_job_artifacts"."id" BETWEEN 0 AND 99999

Time: 303.937 ms

https://explain.depesz.com/s/rn7xE

Note: Terraform reports are a relatively new feature, introduced in %13.0. It's highly likely that this query did not find any records matching WHERE "ci_job_artifacts"."file_type" = 18, since it's looking at the beginning of the table. I'm not sure if/how the performance is affected by the number of matching records.

Proposal

This MR adds a new partial index to optimize the queries above:

CREATE INDEX index_ci_job_artifacts_id_for_terraform_reports ON public.ci_job_artifacts USING btree (id) WHERE (file_type = 18)

Query performance with index

SELECT MIN("ci_job_artifacts"."id") FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" = 18

Time: 0.633 ms

https://explain.depesz.com/s/39AB

SELECT MAX("ci_job_artifacts"."id") FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" = 18

Time: 0.611 ms

https://explain.depesz.com/s/GCv9

SELECT COUNT("ci_job_artifacts"."id") FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_type" = 18 AND "ci_job_artifacts"."id" BETWEEN 0 AND 99999

Time: 0.290 ms

https://explain.depesz.com/s/3UiQ

Note: No matching records in this batch.

Migration output

== 20200806202645 AddIndexToCiJobArtifactsForTerraformReportsId: migrating ====
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:ci_job_artifacts, :id, {:where=>"file_type = 18", :name=>"index_ci_job_artifacts_id_for_terraform_reports", :algorithm=>:concurrently})
   -> 0.0032s
-- add_index(:ci_job_artifacts, :id, {:where=>"file_type = 18", :name=>"index_ci_job_artifacts_id_for_terraform_reports", :algorithm=>:concurrently})
   -> 0.0036s
== 20200806202645 AddIndexToCiJobArtifactsForTerraformReportsId: migrated (0.0071s)

Index creation took 29.630 min in database lab.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

[-] Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
[-] Tested in all supported browsers
[-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done

Edited Aug 13, 2020 by Alishan Ladhani

Optimize counts.terraform_reports usage ping counter