Create generic Elasticsearch delete worker
Background
There are multiple delete workers used to cleanup Elasticsearch data. There are problems when indexes use non-project routing. When projects are transferred, new records get created under the new routing but the records with old routing get accidentally left behind in the index. This results in duplicate records that are hard to detect and require Advanced search migration work to remedy
As more document types are indexed into Elasticsearch, there is potential for other routing strategies to be introduced.
Current worker/delete code:
ElasticDeleteProjectWorker
Search::Wiki::ElasticDeleteGroupWikiWorker
Search::ElasticGroupAssociationDeletionWorker
Gitlab::Elastic::Helper.remove_wikis_from_the_standalone_index
Delete workflows
Event |
---|
Project transfer to another group within same root namespace |
Project transfer to another root namespace |
Group transfer to another group within same root namespace |
Group transfer to another root namespace |
Project deleted |
Group deleted |
Proposal
Replace all existing removal workers with a generic Elastic removal worker to cover removing all data.
Use the traversal_ids
field and a prefix query to remove data when the top level root namespace changes
When groups or projects are transferred, the process should:
- Queue the group or project for backfill
- Queue a delete worker that is safe to run before/after the backfill completes