Skip to content

Heuristic to choose indexes for reindexing

Andreas Brandl requested to merge ab/reindex-heuristic into master

What does this MR do?

This adds an index selection strategy for reindexing. We strive to reindex indexes with the highest level of bloat first, since this has the most impact on the overall, absolute amount of btree bloat in the database.

We add a view to estimate btree bloat. The caveat here is that querying the full view is expensive. Instead, we retrieve candidate indexes and estimate their individual bloat one-by-one. In production, this takes about 200ms per index.

After having established bloat estimates for all indexes, we simply order in descending order and pick the two at the top (with the highest bloat).

This can be revised later, too - so this a basic version only.

The view definition works out on PG11 and PG12.

References:

  1. Bloat estimation for btrees https://github.com/ioguix/pgsql-bloat-estimation#bloat-in-btree-indexes

Query plans

We can get a single btree bloat estimate with:

select * from postgres_index_bloat_estimates where identifier = 'public.users_pkey';

Plan: https://explain.depesz.com/s/8K7g

The runtime is more or less the same for all indexes.

Does this MR meet the acceptance criteria?

Conformity

Edited by Andreas Brandl

Merge request reports

Loading