Geo: Skip project/wiki repo create/update events based on selective sync (!39488) · Merge requests · GitLab.org / GitLab

Michael Kozono requested to merge mk/fix-project-wiki-events-for-selective-sync into master Aug 13, 2020

What does this MR do?

Skips syncing a project/wiki on create/update, if it's not supposed to be replicated due to selective sync.

There is a performance concern when many projects are created at once; this change can slow down the Geo log cursor. The worst case is when selective sync is enabled by namespace and the namespace is large. The query timing shows 330ms. So if 100 projects were created simultaneously, then this MR could add 33s for secondaries to process the create events. That's not the end of the world given Geo uses an async design. We could move more of this event processing work into jobs, but I suggest we can accept this for now, and wait for project/wiki repos to migrate onto the Self-Service Framework, since that will already do this work in jobs.

Resolves #233504 (closed)

Database queries and `#database-lab` explain results

These are queries produced by the GeoNode#projects_include? call.

No selective sync

Not shown because it's a trivial case (check if the current Geo node selective_sync_type is present).

Selective sync by namespace actual query

SELECT 1 AS one 
FROM "projects" 
WHERE "projects"."namespace_id" 
IN (
  WITH RECURSIVE "base_and_descendants" 
  AS (
    (
      SELECT "geo_node_namespace_links"."namespace_id" AS id 
      FROM "geo_node_namespace_links" 
      WHERE "geo_node_namespace_links"."geo_node_id" = 1)
    UNION
    (
      SELECT "namespaces"."id" 
      FROM "namespaces", "base_and_descendants" 
      WHERE "namespaces"."parent_id" = "base_and_descendants"."id")) 
  SELECT "id" 
  FROM "base_and_descendants" AS "namespaces") 
AND "projects"."id" = 1 LIMIT 1

Plan with execution:
 Limit  (cost=1919.90..1927.42 rows=1 width=4) (actual time=6.550..6.550 rows=0 loops=1)
   Buffers: shared hit=1 read=4
   I/O Timings: read=6.460
   ->  Nested Loop  (cost=1919.90..1927.42 rows=1 width=4) (actual time=6.549..6.549 rows=0 loops=1)
         Buffers: shared hit=1 read=4
         I/O Timings: read=6.460
         ->  Index Scan using projects_pkey on public.projects  (c
[...SKIP...]
Full execution plan (The text in the preview above has been cut)
Other artifacts are provided in the thread

Recommendations:
:white_check_mark: Looks good

Summary:
Time: 7.674 ms
  - planning: 0.901 ms
  - execution: 6.773 ms
    - I/O read: 6.460 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 1 (~8.00 KiB) from the buffer pool
  - reads: 4 (~32.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Selective sync by namespace with gitlab-org hardcoded

To make it search some actual data on #database-lab since GitLab.com doesn't use Geo.

SELECT 1 AS one 
FROM "projects" 
WHERE "projects"."namespace_id" 
IN (
  WITH RECURSIVE "base_and_descendants" 
  AS (
    (
      SELECT 9970 AS "id")
    UNION
    (
      SELECT "namespaces"."id" 
      FROM "namespaces", "base_and_descendants" 
      WHERE "namespaces"."parent_id" = "base_and_descendants"."id")) 
  SELECT "id" 
  FROM "base_and_descendants" AS "namespaces") 
AND "projects"."id" = 1 LIMIT 1

Plan with execution:
 Limit  (cost=286.20..293.71 rows=1 width=4) (actual time=328.226..328.226 rows=0 loops=1)
   Buffers: shared hit=597 read=154 dirtied=4
   I/O Timings: read=323.548
   ->  Nested Loop  (cost=286.20..293.71 rows=1 width=4) (actual time=328.225..328.225 rows=0 loops=1)
         Buffers: shared hit=597 read=154 dirtied=4
         I/O Timings: read=323.548
         ->  Index Scan using 
[...SKIP...]
Full execution plan (The text in the preview above has been cut)
Other artifacts are provided in the thread

Recommendations:
:exclamation: Query processes too much data to return a relatively small number of rows. – Reduce data cardinality as early as possible during the execution, using one or several of the following techniques: new indexes, partitioning, query rewriting, denormalization. See the visualization of the plan to understand which plan nodes are the main bottlenecks. Show details
:exclamation: VACUUM ANALYZE needed – Visibility map(s) for the table(s) involved in the query are outdated. For better performance: 1) run VACUUM ANALYZE on them as a one-time solution, 2) perform autovacuum tuning as a long-term permanent solution (tuning "vacuum" part of autovacuum). Show details

Summary:
Time: 329.196 ms
  - planning: 0.866 ms
  - execution: 328.330 ms
    - I/O read: 323.548 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 597 (~4.70 MiB) from the buffer pool
  - reads: 154 (~1.20 MiB) from the OS file cache, including disk I/O
  - dirtied: 4 (~32.00 KiB)
  - writes: 0

Selective sync by shard

SELECT 1 AS one 
FROM "projects" 
WHERE "projects"."repository_storage" = 'nfs-file02'
AND "projects"."id" = 1
LIMIT 1

Plan with execution:
 Limit  (cost=0.43..3.46 rows=1 width=4) (actual time=0.025..0.025 rows=0 loops=1)
   Buffers: shared hit=4
   ->  Index Scan using projects_pkey on public.projects  (cost=0.43..3.46 rows=1 width=4) (actual time=0.024..0.024 rows=0 loops=1)
         Index Cond: (projects.id = 1)
         Filter: ((projects.repository_storage)::text = 'nfs-file02'::text)
         Rows Removed by Filte
[...SKIP...]
Full execution plan (The text in the preview above has been cut)
Other artifacts are provided in the thread

Recommendations:
:exclamation: Specialized index needed – The index(es) currently used does not serve quite well for the needs of this query (notice Rows Removed by Filter: ..., meaning that the index fetched many non-target rows). Consider adding more specialized index(es). Show details

Summary:
Time: 0.481 ms
  - planning: 0.436 ms
  - execution: 0.045 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 4 (~32.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.

Edited Aug 14, 2020 by Michael Kozono

Geo: Skip project/wiki repo create/update events based on selective sync