Optimize Deployment API that filters by updated_at [RUN ALL RSPEC] [RUN AS-IF-FOSS] (!59771) · Merge requests · GitLab.org / GitLab

Shinya Maeda requested to merge optimize-deployment-finder into master Apr 20, 2021

What does this MR do?

This MR mitigates the GET /api/:version/projects/:id/deployments with updated_before param is slow and also addresses a couple of ~"technical debt" in DeploymentFinder.

Optimization 1: `environment` filter must specify project context

According to the issue, the problematic query looks like this:

SELECT
    "deployments".*
FROM
    "deployments"
    INNER JOIN "environments" ON "environments"."id" = "deployments"."environment_id"
WHERE
    "deployments"."project_id" = 22679027
    AND "deployments"."updated_at" <= '2020-12-04 05:06:26'
    AND "environments"."name" = '*****'
ORDER BY
    "deployments"."iid" DESC
LIMIT 100 OFFSET 0

(See https://log.gprd.gitlab.net/goto/c36cb768621cdbaf09062c581b1712bd for the actual timing on production.)

One of the problems in the above query is that selecting environments rows by name filter, without specifying its project. environments records are unique on project_id and name columns, so it should be combined in order to trigger an index scan with the unique index. Therefore, we'll modify the query to:

SELECT
    "deployments".*
FROM
    "deployments"
WHERE
    "deployments"."project_id" = 18914599
    AND "deployments"."environment_id" = (SELECT id FROM environments WHERE project_id = 18914599 AND name = '[REDACTED]')
    AND "deployments"."updated_at" <= '2021-04-20T00:32:59Z'
ORDER BY
    "deployments"."id" DESC
LIMIT
    100 OFFSET 100

Optimization 2: Enforce the ordering when date-type filter is used

Currently, DeploymentFinder accepts date-type filter and id ordering, e.g. WHERE updated_at < ? ORDER BY id. This produces a very inefficient query because PostgreSQL has to sequentially sort the fetched deployments rows. For more information, please see @tigerwnz's nice summary, but in short, this can't be optimized by just a database index, but we need to limit the parameter patterns.

The date-type filter and ordering should be matched in order to produce an efficient query (in fact, it can speed up 2000x times). Hence, when users specify updated_at filter in the Deployment API, the finder implicitly overwrites the parameter to be ORDER BY updated_at. Generally, this will not impact usability on the API, because the default id sorting (i.e. creation date order) and updated_at sorting can produce almost same result in the nature of the GitLab Environments and Deployments, rather there is a high chance that users are NOT getting correct deployments data with updated_at filtering, which will be followed up in #328500 (closed).

This behavioral change is behind deployments_finder_implicitly_enforce_ordering_for_updated_at_filter feature flag, which is disabled by default.

Here is an example of API request and query:

curl -H 'Private-Token: [REDACTED]' "http://local.gitlab.test:8181/api/v4/projects/35/deployments?environment=production&order_by=updated_at&sort=desc&updated_before=2021-05-03T14:05:28Z"

SELECT "deployments".*
FROM "deployments"
WHERE "deployments"."project_id" = 35
  AND "deployments"."updated_at" <= '2021-05-03 14:05:28'
  AND "deployments"."environment_id" IN (
    SELECT "environments"."id" FROM "environments" WHERE "environments"."project_id" = 35 AND "environments"."name" = 'production'
  )
ORDER BY "deployments"."updated_at" DESC, "deployments"."id" DESC LIMIT 20 OFFSET 0

Optimization 3: Re-create a index for tie-breaker

When updated_at filter is specified, it also specify id as the second ordering for the tie-breaker. Therefore, the index index_deployments_on_project_and_environment_and_updated_at_id should include id.

Here is an example of query plan for an environment that has 81559 of deployments:

 Limit  (cost=0.57..26.64 rows=20 width=138) (actual time=11.973..32.390 rows=20 loops=1)
   Buffers: shared hit=1 read=17 dirtied=5
   I/O Timings: read=32.121
   ->  Index Scan using dosuken_test on public.deployments  (cost=0.57..68.36 rows=52 width=138) (actual time=11.971..32.377 rows=20 loops=1)
         Index Cond: ((deployments.project_id = 14359413) AND (deployments.environment_id = 1069567) AND (deployments.updated_at <= '2021-05-03 14:05:28'::timestamp without time zone))
         Buffers: shared hit=1 read=17 dirtied=5
         I/O Timings: read=32.121

Optimization 4: Validate inefficient queries

One of the biggest problems in this finder is that it allows any parameters combinations to filter and sort the deployment rows. We should only allow it where it's properly optimized by database index. Otherwise, InefficientQueryError is raised, so that developers can take an action whether using different parameters or removing InefficientQueryError with proper optimization.

Optimization 5: Order by `id` instead of `created_at` or `iid`.

Notice that we also order by id instead of iid. This still produces the same results, but allows us to reduce the index patterns e.g. project_id, environment_id, id VS project_id, environment_id, id and project_id, environment_id, iid.

Migration

shinya@shinya-B550-VISION-D:~/workspace/thin-gdk/services/rails/src$ tre bin/rails db:migrate:redo VERSION=20210420210642
INFO: This script is a predefined script in devkitkat.
== 20210420210642 RecreateIndexForProjectDeploymentsWithEnvironmentIdAndDateAt: reverting 
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:deployments, [:project_id, :environment_id, :updated_at], {:name=>"index_deployments_on_project_and_environment_and_updated_at", :algorithm=>:concurrently})
   -> 0.0073s
-- add_index(:deployments, [:project_id, :environment_id, :updated_at], {:name=>"index_deployments_on_project_and_environment_and_updated_at", :algorithm=>:concurrently})
   -> 0.0054s
-- transaction_open?()
   -> 0.0000s
-- indexes(:deployments)
   -> 0.0072s
-- remove_index(:deployments, {:algorithm=>:concurrently, :name=>"index_deployments_on_project_and_environment_and_updated_at_id"})
   -> 0.0020s
== 20210420210642 RecreateIndexForProjectDeploymentsWithEnvironmentIdAndDateAt: reverted (0.0246s) 

== 20210420210642 RecreateIndexForProjectDeploymentsWithEnvironmentIdAndDateAt: migrating 
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:deployments, [:project_id, :environment_id, :updated_at, :id], {:name=>"index_deployments_on_project_and_environment_and_updated_at_id", :algorithm=>:concurrently})
   -> 0.0055s
-- add_index(:deployments, [:project_id, :environment_id, :updated_at, :id], {:name=>"index_deployments_on_project_and_environment_and_updated_at_id", :algorithm=>:concurrently})
   -> 0.0053s
-- transaction_open?()
   -> 0.0000s
-- indexes(:deployments)
   -> 0.0078s
-- remove_index(:deployments, {:algorithm=>:concurrently, :name=>"index_deployments_on_project_and_environment_and_updated_at"})
   -> 0.0020s
== 20210420210642 RecreateIndexForProjectDeploymentsWithEnvironmentIdAndDateAt: migrated (0.0219s)

Timing

exec CREATE INDEX dosuken_test ON deployments USING btree (project_id, environment_id, updated_at, id);


% time      seconds wait_event
------ ------------ -----------------------------
73.06    464.563496 IO.DataFileRead
18.46    117.399917 Running
2.25      14.331714 IO.DataFileExtend
1.69      10.735284 IO.BufFileRead
1.58      10.076678 IO.BufFileWrite
1.43       9.082661 IO.WALWrite
0.65       4.161955 IPC.ParallelCreateIndexScan
0.65       4.150678 LWLock.WALWriteLock
0.09       0.597721 IO.DataFileWrite
0.06       0.402362 LWLock.WALBufMappingLock
0.05       0.344613 IO.SLRURead
0.00       0.010695 LWLock.buffer_mapping
------ ------------ -----------------------------
100.00   635.857774

Size

\di+ dosuken_test

List of relations
 Schema |     Name     | Type  |   Owner    |    Table    |  Size   | Description 
--------+--------------+-------+------------+-------------+---------+-------------
 public | dosuken_test | index | joe_shinya | deployments | 5577 MB | 
(1 row)

Does this MR meet the acceptance criteria?

Conformity

📋 Does this MR need a changelog?
- I have included a changelog entry.
- I have not included a changelog entry because _____.
Documentation (if required)
Code review guidelines
Merge request performance guidelines
Style guides
Database guides
Separation of EE specific content

Availability and Testing

Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
Tested in all supported browsers
Informed Infrastructure department of a default or new setting change, if applicable per definition of done

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

Label as security and @ mention @gitlab-com/gl-security/appsec
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
Security reports checked/validated by a reviewer from the AppSec team

Edited May 03, 2021 by Shinya Maeda

Optimize Deployment API that filters by updated_at [RUN ALL RSPEC] [RUN AS-IF-FOSS]