Database reindexing attempting to delete reindexes in progress
During async index creation, there were a few errors from the deploy host. In sentry I found: https://sentry.gitlab.net/gitlab/gitlabcom/issues/2737424/events/48835509/?environment=gprd
Example statement can be seen below, which is hitting a statement timeout:
| /*application:web*/ DROP INDEX CONCURRENTLY IF EXISTS "public"."index_ci_builds_on_user_id_and_created_at_and_type_eq_ci__ccnew" |
|----------------------------------------------------------------------------------------------------------------------------------|
It appears to be because the reindex operation is still running for that index. I checked the database at that time and the index exists, but is INVALID
, and we can also see the original index is still there as well:
"index_ci_builds_on_user_id_and_created_at_and_type_eq_ci__ccnew" btree (user_id, created_at) WHERE type::text = 'Ci::Build'::text INVALID
"index_ci_builds_on_user_id_and_created_at_and_type_eq_ci_build" btree (user_id, created_at) WHERE type::text = 'Ci::Build'::text
I think this is because the Gitlab::Database::Reindexing.cleanup_leftovers!
here: https://gitlab.com/gitlab-org/gitlab/-/blob/982411c0add9377c82be04cc1fb7ea19c3b7b58e/lib/tasks/gitlab/db.rake#L177 is being called outside the ExclusiveLease
, so when we get multiple parallel runs of the reindexing logic it attempts to drop an index that another process is still working on.