Geo: Skip project/wiki repo create/update events based on selective sync
What does this MR do?
Skips syncing a project/wiki on create/update, if it's not supposed to be replicated due to selective sync.
There is a performance concern when many projects are created at once; this change can slow down the Geo log cursor. The worst case is when selective sync is enabled by namespace and the namespace is large. The query timing shows 330ms. So if 100 projects were created simultaneously, then this MR could add 33s for secondaries to process the create events. That's not the end of the world given Geo uses an async design. We could move more of this event processing work into jobs, but I suggest we can accept this for now, and wait for project/wiki repos to migrate onto the Self-Service Framework, since that will already do this work in jobs.
Resolves #233504 (closed)
#database-lab
explain results
Database queries and These are queries produced by the GeoNode#projects_include?
call.
No selective sync
Not shown because it's a trivial case (check if the current Geo node selective_sync_type is present).
Selective sync by namespace actual query
SELECT 1 AS one
FROM "projects"
WHERE "projects"."namespace_id"
IN (
WITH RECURSIVE "base_and_descendants"
AS (
(
SELECT "geo_node_namespace_links"."namespace_id" AS id
FROM "geo_node_namespace_links"
WHERE "geo_node_namespace_links"."geo_node_id" = 1)
UNION
(
SELECT "namespaces"."id"
FROM "namespaces", "base_and_descendants"
WHERE "namespaces"."parent_id" = "base_and_descendants"."id"))
SELECT "id"
FROM "base_and_descendants" AS "namespaces")
AND "projects"."id" = 1 LIMIT 1
Plan with execution:
Limit (cost=1919.90..1927.42 rows=1 width=4) (actual time=6.550..6.550 rows=0 loops=1)
Buffers: shared hit=1 read=4
I/O Timings: read=6.460
-> Nested Loop (cost=1919.90..1927.42 rows=1 width=4) (actual time=6.549..6.549 rows=0 loops=1)
Buffers: shared hit=1 read=4
I/O Timings: read=6.460
-> Index Scan using projects_pkey on public.projects (c
[...SKIP...]
Full execution plan (The text in the preview above has been cut)
Other artifacts are provided in the thread
Recommendations:
:white_check_mark: Looks good
Summary:
Time: 7.674 ms
- planning: 0.901 ms
- execution: 6.773 ms
- I/O read: 6.460 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 1 (~8.00 KiB) from the buffer pool
- reads: 4 (~32.00 KiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Selective sync by namespace with gitlab-org hardcoded
To make it search some actual data on #database-lab
since GitLab.com doesn't use Geo.
SELECT 1 AS one
FROM "projects"
WHERE "projects"."namespace_id"
IN (
WITH RECURSIVE "base_and_descendants"
AS (
(
SELECT 9970 AS "id")
UNION
(
SELECT "namespaces"."id"
FROM "namespaces", "base_and_descendants"
WHERE "namespaces"."parent_id" = "base_and_descendants"."id"))
SELECT "id"
FROM "base_and_descendants" AS "namespaces")
AND "projects"."id" = 1 LIMIT 1
Plan with execution:
Limit (cost=286.20..293.71 rows=1 width=4) (actual time=328.226..328.226 rows=0 loops=1)
Buffers: shared hit=597 read=154 dirtied=4
I/O Timings: read=323.548
-> Nested Loop (cost=286.20..293.71 rows=1 width=4) (actual time=328.225..328.225 rows=0 loops=1)
Buffers: shared hit=597 read=154 dirtied=4
I/O Timings: read=323.548
-> Index Scan using
[...SKIP...]
Full execution plan (The text in the preview above has been cut)
Other artifacts are provided in the thread
Recommendations:
:exclamation: Query processes too much data to return a relatively small number of rows. – Reduce data cardinality as early as possible during the execution, using one or several of the following techniques: new indexes, partitioning, query rewriting, denormalization. See the visualization of the plan to understand which plan nodes are the main bottlenecks. Show details
:exclamation: VACUUM ANALYZE needed – Visibility map(s) for the table(s) involved in the query are outdated. For better performance: 1) run VACUUM ANALYZE on them as a one-time solution, 2) perform autovacuum tuning as a long-term permanent solution (tuning "vacuum" part of autovacuum). Show details
Summary:
Time: 329.196 ms
- planning: 0.866 ms
- execution: 328.330 ms
- I/O read: 323.548 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 597 (~4.70 MiB) from the buffer pool
- reads: 154 (~1.20 MiB) from the OS file cache, including disk I/O
- dirtied: 4 (~32.00 KiB)
- writes: 0
Selective sync by shard
SELECT 1 AS one
FROM "projects"
WHERE "projects"."repository_storage" = 'nfs-file02'
AND "projects"."id" = 1
LIMIT 1
Plan with execution:
Limit (cost=0.43..3.46 rows=1 width=4) (actual time=0.025..0.025 rows=0 loops=1)
Buffers: shared hit=4
-> Index Scan using projects_pkey on public.projects (cost=0.43..3.46 rows=1 width=4) (actual time=0.024..0.024 rows=0 loops=1)
Index Cond: (projects.id = 1)
Filter: ((projects.repository_storage)::text = 'nfs-file02'::text)
Rows Removed by Filte
[...SKIP...]
Full execution plan (The text in the preview above has been cut)
Other artifacts are provided in the thread
Recommendations:
:exclamation: Specialized index needed – The index(es) currently used does not serve quite well for the needs of this query (notice Rows Removed by Filter: ..., meaning that the index fetched many non-target rows). Consider adding more specialized index(es). Show details
Summary:
Time: 0.481 ms
- planning: 0.436 ms
- execution: 0.045 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 4 (~32.00 KiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.