Use PostgreSQL FDW for Geo downloads
Part of gitlab-org/gitlab-ee#3382
With these custom queries using FDW, we see relatively fast queries on the Geo testbed (at least relative to what was there before):
LFS objects
gitlabhq_geo_production=# explain analyze SELECT "gitlab_secondary"."lfs_objects"."id" FROM "gitlab_secondary"."lfs_objects" LEFT OUTER JOIN file_registry ON file_registry.file_id = gitlab_secondary.lfs_objects.id AND (gitlab_secondary.lfs_objects.file_store IS NULL OR gitlab_secondary.lfs_objects.file_store = 1) AND file_registry.file_type = 'lfs' WHERE (file_registry.file_id IS NULL) ORDER BY "gitlab_secondary"."lfs_objects"."created_at" DESC LIMIT 1000;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=100.42..2738.22 rows=1000 width=12) (actual time=1412.209..1470.219 rows=1000 loops=1)
-> Nested Loop Anti Join (cost=100.42..5476.26 rows=2038 width=12) (actual time=1412.209..1470.143 rows=1000 loops=1)
Join Filter: ((lfs_objects.file_store IS NULL) OR (lfs_objects.file_store = 1))
-> Foreign Scan on lfs_objects (cost=100.00..177.54 rows=2048 width=16) (actual time=1411.078..1435.640 rows=14782 loops=1)
-> Index Only Scan using index_file_registry_on_file_type_and_file_id on file_registry (cost=0.42..2.57 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=14782)
Index Cond: ((file_type = 'lfs'::text) AND (file_id = lfs_objects.id))
Heap Fetches: 7298
Planning time: 0.176 ms
Execution time: 1484.609 ms
(9 rows)
Uploads
gitlabhq_geo_production=# explain analyze SELECT "gitlab_secondary"."uploads"."id", "gitlab_secondary"."uploads"."uploader" FROM "gitlab_secondary"."uploads" LEFT OUTER JOIN file_registry ON file_registry.file_id = gitlab_secondary.uploads.id AND file_registry.file_type IN ('attachment','avatar','file','personal_file') WHERE (file_registry.file_id IS NULL) ORDER BY "gitlab_secondary"."uploads"."created_at" DESC LIMIT 1000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5685.64..5687.15 rows=602 width=44) (actual time=1485.559..1485.786 rows=1000 loops=1)
-> Sort (cost=5685.64..5687.15 rows=602 width=44) (actual time=1485.557..1485.702 rows=1000 loops=1)
Sort Key: uploads.created_at DESC
Sort Method: top-N heapsort Memory: 127kB
-> Hash Anti Join (cost=5594.35..5657.85 rows=602 width=44) (actual time=48.452..1399.048 rows=543554 loops=1)
Hash Cond: (uploads.id = file_registry.file_id)
-> Foreign Scan on uploads (cost=100.00..146.12 rows=1204 width=44) (actual time=0.404..1164.297 rows=633528 loops=1)
-> Hash (cost=4365.64..4365.64 rows=90297 width=4) (actual time=47.434..47.434 rows=89974 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 4188kB
-> Seq Scan on file_registry (cost=0.00..4365.64 rows=90297 width=4) (actual time=0.018..33.584 rows=89974 loops=1)
Filter: ((file_type)::text = ANY ('{attachment,avatar,file,personal_file}'::text[]))
Rows Removed by Filter: 80714
Planning time: 0.184 ms
Execution time: 1486.416 ms
(14 rows)
Edited by Stan Hu