Projects are not being fully indexed
Summary
It appears that projects are not being fully indexed into Elasticsearch on GitLab.com. There have been a few instances where a customer reports not seeing files in search, yet the project index status reports that the indexing has occurred. A manual reindex of the project has been shown to fix the issue.
Previously reported under: #250856 (closed)
Steps to reproduce
We have not been able to reproduce this locally. Though a few of the projects were imported, it's not definitely only happening to imports.
Example Project
What is the current bug behavior?
Project data is partially indexed.
What is the expected correct behavior?
All files should be indexed correctly into Elasticsearch AND the index status should not be updated unless all files were successfully indexed.
Possible fixes
From #259721 (comment 523934361) we should ensure we "delete if exists" the IndexStatus
whenever we run ElasticDeleteProjectWorker
. Mostly this runs when a project is deleted so there won't be any project.index_status
. But we can do something like IndexStatus.where(project_id: project_id).delete_all
in this code to avoid this problem.
Workaround in the meantime
Any project code can be completely reindexed by running the following (note that this can be slow for large projects as it reindexes every commit and file again):
project_id = # replace me with project ID
project = Project.find(project_id)
index_status = project.index_status
index_status.destroy
ElasticCommitIndexerWorker.perform_async(project.id)
Workaround for a whole group
Since #259721 (comment 523934361) implies that the problem is likely to affect an entire group we may want to index all the repositories in the group again. We should first check that the single project is fixed by the above before we do this since it is expensive for large groups:
group = Group.find(<ID>)
project_ids = group.all_projects.pluck(:id)
project_ids.each_slice(50) do |ids|
p ids # In case we fail half way through we have a trail of where we got up to
ids.each do |project_id|
project = Project.find(project_id)
index_status = project.index_status
if index_status
index_status.destroy
end
ElasticCommitIndexerWorker.perform_async(project.id)
end
sleep(1)
end