Ensure primary key index is used for Geo::Replicable
What does this MR do and why?
In gitlab-com/gl-infra/production#7756 (closed), we
found that for a large ci_job_artifacts
table, the query planner can
flip between using one of two indexes to find local files:
ci_job_artifacts_pkey
index_ci_job_artifacts_on_file_store
The first index is always faster because for a given primary ID, there
is only one row and hence filtering by file_store = 1
is fast.
In the worst case on GitLab.com, the second index would have to sift through over 22,000 rows to find a matching artifact with the given primary ID.
Even if Geo is not enabled, save_verification_details
is called with
every model that is replicated hooks into save
. To ensure that the
query is always fast, use a materialized CTE to look up by
primary key ID and filter from there.
Resolves #374701 (closed)
Resolves #374718 (closed)
SQL Queries
Before
When the query planner and statistics are in a good state, this query works just fine.
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/12244/commands/43526 shows:
SELECT 1 AS one FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_store" = 1 AND "ci_job_artifacts"."id" = 10 LIMIT 1
Limit (cost=0.58..3.60 rows=1 width=4) (actual time=0.088..0.088 rows=0 loops=1)
Buffers: shared hit=7
I/O Timings: read=0.000 write=0.000
-> Index Scan using ci_job_artifacts_pkey on public.ci_job_artifacts (cost=0.58..3.60 rows=1 width=4) (actual time=0.086..0.086 rows=0 loops=1)
Index Cond: (ci_job_artifacts.id = 10)
Filter: (ci_job_artifacts.file_store = 1)
Rows Removed by Filter: 0
Buffers: shared hit=7
I/O Timings: read=0.000 write=0.000
Time: 0.302 ms
- planning: 0.196 ms
- execution: 0.106 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 7 (~56.00 KiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
When things go bad!
For a bad case, see gitlab-com/gl-infra/production#7756 (comment 1104365277):
gitlabhq_production=# explain ( analyze, buffers ) SELECT 1 AS one FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."file_store" = 1 AND "ci_job_artifacts"."id" = 3314157561 LIMIT 1 ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.58..2.10 rows=1 width=4) (actual time=30.011..30.012 rows=0 loops=1)
Buffers: shared hit=18243
-> Index Scan using index_ci_job_artifacts_on_file_store on ci_job_artifacts (cost=0.58..2.10 rows=1 width=4) (actual time=30.010..30.010 rows=0 loops=1)
Index Cond: (file_store = 1)
Filter: (id = '3314157561'::bigint)
Rows Removed by Filter: 22630
Buffers: shared hit=18243
Planning Time: 0.128 ms
Execution Time: 30.028 ms
(9 rows)
18,243 buffers were read. This by itself isn't terrible, but since CI job traces get saved over 15,000 times a minute (https://log.gprd.gitlab.net/goto/75ee0f90-3ab9-11ed-8656-f5f2137823ba), this can add up!
After
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/12244/commands/43528:
WITH "verifiables" AS MATERIALIZED
(SELECT "ci_job_artifacts"."project_id", "ci_job_artifacts"."file_type", "ci_job_artifacts"."size", "ci_job_artifacts"."created_at", "ci_job_artifacts"."updated_at", "ci_job_artifacts"."expire_at", "ci_job_artifacts"."file", "ci_job_artifacts"."file_store", "ci_job_artifacts"."file_sha256", "ci_job_artifacts"."file_format", "ci_job_artifacts"."file_location", "ci_job_artifacts"."id", "ci_job_artifacts"."job_id", "ci_job_artifacts"."locked", "ci_job_artifacts"."partition_id"
FROM "ci_job_artifacts"
WHERE "ci_job_artifacts"."id" = 10)
SELECT 1 AS one
FROM "verifiables" AS "ci_job_artifacts"
WHERE "ci_job_artifacts"."file_store" = 1
LIMIT 1
Limit (cost=3.59..3.62 rows=1 width=4) (actual time=0.055..0.056 rows=0 loops=1)
Buffers: shared hit=7
I/O Timings: read=0.000 write=0.000
CTE verifiables
-> Index Scan using ci_job_artifacts_pkey on public.ci_job_artifacts ci_job_artifacts_1 (cost=0.58..3.59 rows=1 width=147) (actual time=0.052..0.053 rows=0 loops=1)
Index Cond: (ci_job_artifacts_1.id = 10)
Buffers: shared hit=7
I/O Timings: read=0.000 write=0.000
-> CTE Scan on verifiables ci_job_artifacts (cost=0.00..0.02 rows=1 width=4) (actual time=0.054..0.054 rows=0 loops=1)
Filter: (ci_job_artifacts.file_store = 1)
Rows Removed by Filter: 0
Buffers: shared hit=7
I/O Timings: read=0.000 write=0.000
Time: 0.894 ms
- planning: 0.782 ms
- execution: 0.112 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 7 (~56.00 KiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.