Skip to content

Materialize valid_primaries view (14.5)

Sami Hiltunen requested to merge smh-optimize-dataloss-query-14-5 into master

The dataloss query is extremely slow for bigger datasets. The problem is that for each row that the data loss query is returning, Postgres computes the full result of the valid_primaries view only to filter down to the correct record. This results in an o(n2) complexity which kills the performance as soon as the dataset size increases. It's not clear why the join parameters are not pushed down in to the view in the query.

This optimizes the query by materializing the valid_primaries view. This ensures Postgres computes the full view only once and joins with the pre-computed result.

RepositoryStoreCollector gathers metrics on repositories which don't have a valid primary candidates available. This indicates the repository is unavailable as the current primary is not valid and ther are no valid candidates to failover to. The query is currently extremely inefficient on some versions of Postgres as it ends up computing the full valid_primaries view for each of the rows it checks. This doesn't seem to occur on all versions of Postgres, namely 12.6 at least manages to push down the search criteria inside the view. This commit fixes the situation by materializing the valid_primaries view prior to querying it. This ensures the full view isn't computed for all of the rows but rather Postgres just uses the pre-computed result.

See #3744 (comment 735797239) for explanation and query plans.

Part of #3744 (closed)

Edited by Sami Hiltunen

Merge request reports

Loading