Handle project create/update/delete actions with bulk-incremental indexer
Summary
In !24298 (comment 291135616) , we add a bulk-incremental indexer for most elasticsearch-indexable resources in GitLab. However, projects are excluded from scope - on create, update, and delete, we continue to schedule ElasticIndexerWorker
jobs instead.
This is that there is considerable additional work done by ElasticIndexerWorker
for projects on creation and deletion (update is fine as-is, but separating it out was too much work).
Improvements
Create
We need to refactor these operations so they are handled correctly with the bulk-incremental worker.
On creation, the main missing thing appears to be that the project wiki is not correctly indexed - although this appears to be a problem for the ElasticIndexerWorker
approach too: #207491 (closed)
We should refactor ElasticIndexerWorker
so there is no reference to ElasticCommitIndexerWorker
in it.
It also schedules an initial bulk import for each of the indexable associations that are in the project. This already seems to be handled correctly by the bulk-incremental indexer (since as the importer creates each issue, its on-create callbacks are run, scheduling an operation), but this should be verified.
IndexRecordService
is also used for backfill, so we can't remove this code entirely - but we need to ensure that when a project is imported while elasticsearch is turned on, everything in the project is indexed correctly without an ElasticIndexerWorker
being scheduled.
Update
No action needed
Delete
When a project is deleted, we rely on database foreign key constraints to remove many records, so elasticsearch hooks are not fired. Additionally, commit and blob documents are guaranteed to be left behind without further action.
To handle this, ElasticIndexerWorker
runs a "delete by query" command - it also manually deletes the IndexStatus
row, although I'm not sure that's necessary.
The incremental-bulk indexer doesn't handle this at all right now, and would leave behind orphaned records.
I think the best approach here might be to create a specific "erase project from elasticsearch" worker, and have it be scheduled separately at project-delete time, rather than trying to fit this special-casing into the elastic-bulk flow specifically.
Risks
Involved components
Optional: Missing test coverage
We are lacking tests that import a project with elasticsearch enabled and ensure that all records that should be searchable, are searchable. This would be a great time to add them.
We also lack tests that run a ProjectDestroyWorker
for a fully-indexed project and ensures that all the documents for that project (but not other documents) are removed.