Improve Geo selective sync worker to avoid repeated individual SELECTs on project registry
The following discussion from !11998 (merged) should be addressed:
-
@nick.thomas started a discussion: (+4 comments) I'm not sure we need this extra check, given we've only just pulled the list of projects from the database, and the
Geo::RepositoryCleanupWorker
operates fine even if the project has been removed.What's the thinking behind it?
At present, the Geo::RepositoriesCleanupWorker
first looks up a set of projects, then performs a lot of exists?
queries for individual geo_project_registry
rows when scheduling project destruction. It does this to avoid scheduling destructions for projects that have not been tracked yet.
If we move from, say, all repository shards being tracked, to just a single shard being tracked, this could easily be on the order of 7 million lookups.
We should improve this by joining the two tables and only returning projects that have a registry entry in the first place. This isn't possible at the moment due to fdw vs. non-fdw concerns, but should be possible from %12.0 when FDW will be mandatory.