Introduce Auto Rollback facility
What does this MR do?
This MR introduces a facility to handles the core business logic of Auto Rollback. The AutoRollbackService
finds an appropriate rollback target and re-deploy it. It will comes with a couple of safe mechanism, such as rate limiter to prevent multiple auto rollbacks in a short interval.
The new Sidekiq worker AutoRollbackWorker
will be used in an upcoming MR. Basically, it'll be executed when a new AlertManagement::Alert
is created with highest severity.
This also fixes the long standing bug that with_deployable
sometimes returns inexistent deployable
object. This causes a huge noise on Sentry, so better to be properly handled.
Related #35404 (closed) Close #218659 (closed)
Query Performance
Since the query on find_rollback_target
doesn't perform well on an environment with many deployments, we need to add an database index to optimize the query.
Here is an EXPLAIN ANALYZE
output with joe-bot. This query was performed on one of the most busiest deployments project on gitlab.com - gitlab-com/www-gitlab-com
.
SELECT "deployments".* FROM "deployments"
INNER JOIN ci_builds ON ci_builds.id = deployments.deployable_id
WHERE "deployments"."environment_id" = 137
AND "deployments"."status" = 2
AND "deployments"."sha" = '292fa023062154f4ccb8f35c39f234dd60f1a071'
ORDER BY "deployments"."id" DESC LIMIT 1
Time: 1.264 ms
- planning: 0.791 ms
- execution: 0.473 ms
- I/O read: 0.376 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 0 from the buffer pool
- reads: 4 (~32.00 KiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Limit (cost=7.19..7.20 rows=1 width=140) (actual time=0.436..0.438 rows=0 loops=1)
Buffers: shared read=4
I/O Timings: read=0.376
-> Sort (cost=7.19..7.20 rows=1 width=140) (actual time=0.435..0.436 rows=0 loops=1)
Sort Key: deployments.id DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared read=4
I/O Timings: read=0.376
-> Nested Loop (cost=1.14..7.18 rows=1 width=140) (actual time=0.430..0.431 rows=0 loops=1)
Buffers: shared read=4
I/O Timings: read=0.376
-> Index Scan using dos_test on public.deployments (cost=0.57..3.59 rows=1 width=140) (actual time=0.429..0.429 rows=0 loops=1)
Index Cond: ((deployments.environment_id = 137) AND (deployments.status = 2) AND ((deployments.sha)::text = '292fa023062154f4ccb8f35c39f234dd60f1a071'::text))
Buffers: shared read=4
I/O Timings: read=0.376
-> Index Only Scan using ci_builds_pkey on public.ci_builds (cost=0.57..3.59 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (ci_builds.id = deployments.deployable_id)
Heap Fetches: 0
(NOTE: dos_test
is same with the new index)
Feature Flag
This feature is under development and disabled by default with cd_auto_rollback
feature flag.
Database Migration
shinya@shinya-MS-7A34:~/workspace/thin-gdk/services/rails/src$ tre bin/rails db:migrate:down VERSION=20201112145311
INFO: This script is a predefined script in devkitkat.
== 20201112145311 AddIndexOnShaForInitialDeployments: reverting ===============
-- transaction_open?()
-> 0.0000s
-- indexes(:services)
-> 0.0030s
-- current_schema()
-> 0.0001s
== 20201112145311 AddIndexOnShaForInitialDeployments: reverted (0.0044s) ======
shinya@shinya-MS-7A34:~/workspace/thin-gdk/services/rails/src$ tre bin/rails db:migrate:up VERSION=20201112145311
INFO: This script is a predefined script in devkitkat.
== 20201112145311 AddIndexOnShaForInitialDeployments: migrating ===============
-- transaction_open?()
-> 0.0000s
-- index_exists?(:deployments, [:environment_id, :status, :sha], {:name=>"index_deployments_on_environment_status_sha", :algorithm=>:concurrently})
-> 0.0052s
== 20201112145311 AddIndexOnShaForInitialDeployments: migrated (0.0056s) ======
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team