Add CounterAttribute concern for CQRS on counter increments
What does this MR do?
Related to #24469 (closed)
Problem
The problem is largely defined and discussed in #24469 (closed)
In short, concurrent updates to counters in statistics tables (e.g. project_statistics
) often fail with query timeouts due to resource contention. We need to make updates more performant and non-blocking while still allow the values to be read correctly.
Solution
Let's use the ProjectStatistics
model as an example. We want to efficiently update build_artifacts_size
counter without incurring into query timeouts.
In this MR we introduce a new module ConterAttribute
that brings counter attributes functionality. It provides a methods increment_counter
that logs the increment event into a separate table (e.g. project_statistics_events
). This fast write allows the domain logic to continue and not depend on the statistics update, which is secondary to the main logic.
Periodically we schedule a worker via cronjob that consolidates all pending increments and updates the primary columns atomically.
This way:
- writes to the primary columns are only performed by the background worker
- reads can be performed against the primary columns (with some delay in accuracy) or including pending increments
Related issue: #24469 (closed)
Query plans
Given the following tables
-
test_counter_attributes
which simulates the model where we includeCounterAttribute
concern (e.g. this could beProjectStatistics
) -
test_counter_attribute_events
which contains the increment logs (with index on the foreign key columntest_counter_attribute_id
). E.g. forProjectStatistics
this would beproject_statistics_id
.
gitlabhq_development=# \d test_counter_attributes
Table "public.test_counter_attributes"
Column | Type | Collation | Nullable | Default
------------+---------+-----------+----------+-----------------------------------------------------
id | bigint | | not null | nextval('test_counter_attributes_id_seq'::regclass)
project_id | integer | | |
counter_a | integer | | not null | 0
counter_b | integer | | not null | 0
Indexes:
"test_counter_attributes_pkey" PRIMARY KEY, btree (id)
gitlabhq_development=# SELECT COUNT(*) FROM "test_counter_attributes";
count
-------
1000
(1 row)
gitlabhq_development=# \d test_counter_attribute_events
Table "public.test_counter_attribute_events"
Column | Type | Collation | Nullable | Default
---------------------------+---------+-----------+----------+-----------------------------------------------------------
id | bigint | | not null | nextval('test_counter_attribute_events_id_seq'::regclass)
test_counter_attribute_id | integer | | |
counter_a | integer | | not null | 0
counter_b | integer | | not null | 0
Indexes:
"test_counter_attribute_events_pkey" PRIMARY KEY, btree (id)
"test_idx" btree (test_counter_attribute_id)
gitlabhq_development=# SELECT COUNT(*) FROM "test_counter_attribute_events";
count
-------
2922
(1 row)
counter_b
Fetching accurate counter for attribute EXPLAIN ANALYSE SELECT ("test_counter_attributes"."counter_b" + COALESCE(SUM("test_counter_attribute_events"."counter_b"), 0)) AS actual_value FROM "test_counter_attributes" LEFT OUTER JOIN "test_counter_attribute_events" ON "test_counter_attribute_events"."test_counter_attribute_id" = "test_counter_attributes"."id" WHERE "test_counter_attributes"."id" = 1 GROUP BY "test_counter_attributes"."counter_b", "test_counter_attributes"."id" ORDER BY "test_counter_attributes"."id" ASC LIMIT 1;
https://explain.depesz.com/s/iTKR
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4.58..21.12 rows=1 width=20) (actual time=0.107..0.109 rows=1 loops=1)
-> GroupAggregate (cost=4.58..21.12 rows=1 width=20) (actual time=0.107..0.107 rows=1 loops=1)
Group Key: test_counter_attributes.id
-> Nested Loop Left Join (cost=4.58..21.09 rows=3 width=16) (actual time=0.080..0.097 rows=3 loops=1)
Join Filter: (test_counter_attribute_events.test_counter_attribute_id = test_counter_attributes.id)
-> Index Scan using test_counter_attributes_pkey on test_counter_attributes (cost=0.28..8.29 rows=1 width=12) (actual time=0.022..0.023 rows=1 loops=1)
Index Cond: (id = 1)
-> Bitmap Heap Scan on test_counter_attribute_events (cost=4.30..12.76 rows=3 width=8) (actual time=0.053..0.068 rows=3 loops=1)
Recheck Cond: (test_counter_attribute_id = 1)
Heap Blocks: exact=2
-> Bitmap Index Scan on test_idx (cost=0.00..4.30 rows=3 width=0) (actual time=0.048..0.048 rows=3 loops=1)
Index Cond: (test_counter_attribute_id = 1)
Planning time: 0.311 ms
Execution time: 0.161 ms
(14 rows)
Consolidate counters in batch
WITH events AS (
DELETE FROM test_counter_attribute_events
WHERE id >= 1002 AND id <= 2003
RETURNING *
)
UPDATE test_counter_attributes
SET counter_a = test_counter_attributes.counter_a + sums.counter_a, counter_b = test_counter_attributes.counter_b + sums.counter_b
FROM (
SELECT test_counter_attribute_id, SUM(counter_a) AS counter_a, SUM(counter_b) AS counter_b
FROM events
GROUP BY test_counter_attribute_id
) AS sums
WHERE test_counter_attributes.id = sums.test_counter_attribute_id;
https://explain.depesz.com/s/yp7w
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Update on test_counter_attributes (cost=96.88..119.52 rows=200 width=70) (actual time=3.282..3.282 rows=0 loops=1)
CTE events
-> Delete on test_counter_attribute_events (cost=0.00..62.83 rows=1002 width=6) (actual time=0.067..1.264 rows=1002 loops=1)
-> Seq Scan on test_counter_attribute_events (cost=0.00..62.83 rows=1002 width=6) (actual time=0.019..0.384 rows=1002 loops=1)
Filter: ((id >= 1002) AND (id <= 2003))
Rows Removed by Filter: 1920
-> Hash Join (cost=34.05..56.69 rows=200 width=70) (actual time=2.138..2.493 rows=402 loops=1)
Hash Cond: (test_counter_attributes.id = sums.test_counter_attribute_id)
-> Seq Scan on test_counter_attributes (cost=0.00..18.00 rows=1000 width=26) (actual time=0.008..0.174 rows=1000 loops=1)
-> Hash (cost=31.55..31.55 rows=200 width=64) (actual time=2.080..2.080 rows=402 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 48kB
-> Subquery Scan on sums (cost=27.55..31.55 rows=200 width=64) (actual time=1.831..1.983 rows=402 loops=1)
-> HashAggregate (cost=27.55..29.55 rows=200 width=20) (actual time=1.810..1.873 rows=402 loops=1)
Group Key: events.test_counter_attribute_id
-> CTE Scan on events (cost=0.00..20.04 rows=1002 width=12) (actual time=0.068..1.496 rows=1002 loops=1)
Planning time: 0.166 ms
Execution time: 3.396 ms
(17 rows)
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team