elasticsearch: wiki is not correctly indexed on import
Summary
Spotted while working on !24298 (merged)
When importing a project (with the code in that MR merged or not), we schedule an ElasticIndexerWorker
which schedules an ElasticCommitIndexerWorker
- one for the project, and one for the wiki.
However, despite running an ElasticCommitIndexerWorker
for the wiki, the wiki's content is not indexed:
2020-02-21_12:04:25.75282 rails-background-jobs : 2020-02-21T12:04:25.752Z 7103 TID-gonqjivqf ElasticIndexerWorker JID-ac256b20a235f4b01e1285bd INFO: start
2020-02-21_12:04:25.80334 rails-background-jobs : 2020-02-21T12:04:25.803Z 7103 TID-gonqjivqf ElasticIndexerWorker JID-ac256b20a235f4b01e1285bd INFO: arguments: ["index","Project",24,"project_24"]
2020-02-21_12:04:26.03886 rails-background-jobs : 2020-02-21T12:04:26.038Z 7103 TID-gonqjiu2z ElasticCommitIndexerWorker JID-b5b3c4b458e19b99a465f4dd INFO: start
2020-02-21_12:04:26.04470 rails-background-jobs : 2020-02-21T12:04:26.044Z 7103 TID-gonqjivwj ElasticCommitIndexerWorker JID-ac48dc583eec5dbd105c40e3 INFO: start
2020-02-21_12:04:26.09957 rails-background-jobs : 2020-02-21T12:04:26.099Z 7103 TID-gonqjivqf ElasticIndexerWorker JID-ac256b20a235f4b01e1285bd INFO: done: 0.347 sec
2020-02-21_12:04:26.16901 rails-background-jobs : 2020-02-21T12:04:26.168Z 7103 TID-gonqjiu2z ElasticCommitIndexerWorker JID-b5b3c4b458e19b99a465f4dd INFO: arguments: [24]
2020-02-21_12:04:26.17959 rails-background-jobs : 2020-02-21T12:04:26.179Z 7103 TID-gonqjiwev RepositoryImportWorker JID-d3ff26601d880cb1fe228da3 INFO: start
2020-02-21_12:04:26.21966 rails-background-jobs : 2020-02-21T12:04:26.219Z 7103 TID-gonqjiu2z ElasticCommitIndexerWorker JID-b5b3c4b458e19b99a465f4dd INFO: done: 0.181 sec
2020-02-21_12:04:26.22730 rails-background-jobs : 2020-02-21T12:04:26.227Z 7103 TID-gonqjivwj ElasticCommitIndexerWorker JID-ac48dc583eec5dbd105c40e3 INFO: arguments: [24,null,null,true]
2020-02-21_12:04:26.33901 rails-background-jobs : 2020-02-21T12:04:26.338Z 7103 TID-gonqjivwj ElasticCommitIndexerWorker JID-ac48dc583eec5dbd105c40e3 INFO: done: 0.294 sec
2020-02-21_12:04:26.36514 rails-background-jobs : 2020-02-21T12:04:26.365Z 7103 TID-gonqjiwev RepositoryImportWorker JID-d3ff26601d880cb1fe228da3 INFO: arguments: [24]
2020-02-21_12:04:26.73851 rails-background-jobs : 2020-02-21T12:04:26.738Z 7103 TID-gonqjiw67 ElasticIndexerWorker JID-26e3820f6bea70a99497d080 INFO: start
2020-02-21_12:04:26.93883 rails-background-jobs : 2020-02-21T12:04:26.938Z 7103 TID-gonqjiw67 ElasticIndexerWorker JID-26e3820f6bea70a99497d080 INFO: arguments: ["update","Project",24,"project_24"]
2020-02-21_12:04:26.98245 rails-background-jobs : 2020-02-21T12:04:26.982Z 7103 TID-gonqjiw67 ElasticIndexerWorker JID-26e3820f6bea70a99497d080 INFO: done: 0.244 sec
2020-02-21_12:04:34.11638 rails-background-jobs : 2020-02-21T12:04:34.116Z 7103 TID-gonqjivwj ElasticIndexerWorker JID-a704cd009b888c4971b7854b INFO: start
2020-02-21_12:04:34.37853 rails-background-jobs : 2020-02-21T12:04:34.378Z 7103 TID-gonqjiu2z Namespaces::ScheduleAggregationWorker JID-e5ca5703cb416ddac3353989 INFO: start
2020-02-21_12:04:34.42818 rails-background-jobs : 2020-02-21T12:04:34.428Z 7103 TID-gonqjivqf DetectRepositoryLanguagesWorker JID-8550d1f40c81fe026fb93d81 INFO: start
2020-02-21_12:04:34.43861 rails-background-jobs : 2020-02-21T12:04:34.438Z 7103 TID-gonqjivwj ElasticIndexerWorker JID-a704cd009b888c4971b7854b INFO: arguments: ["update","Project",24,"project_24"]
2020-02-21_12:04:34.47133 rails-background-jobs : 2020-02-21T12:04:34.471Z 7103 TID-gonqjivwj ElasticIndexerWorker JID-a704cd009b888c4971b7854b INFO: done: 0.355 sec
2020-02-21_12:04:34.49133 rails-background-jobs : 2020-02-21T12:04:34.491Z 7103 TID-gonqjiw67 ElasticCommitIndexerWorker JID-3bc17be78a94737b6b1b00e8 INFO: start
2020-02-21_12:04:34.50851 rails-background-jobs : 2020-02-21T12:04:34.508Z 7103 TID-gonqjiw8r GitGarbageCollectWorker JID-5c1d62c6b49575a4d4e8dfb8 INFO: start
2020-02-21_12:04:34.60752 rails-background-jobs : 2020-02-21T12:04:34.607Z 7103 TID-gonqjiv9r ProjectCacheWorker JID-1c2b95e0a515572948a84fc8 INFO: start
2020-02-21_12:04:34.65172 rails-background-jobs : 2020-02-21T12:04:34.651Z 7103 TID-gonqjiwev RepositoryImportWorker JID-d3ff26601d880cb1fe228da3 INFO: done: 8.472 sec
2020-02-21_12:04:34.76391 rails-background-jobs : 2020-02-21T12:04:34.763Z 7103 TID-gonqjiu2z Namespaces::ScheduleAggregationWorker JID-e5ca5703cb416ddac3353989 INFO: arguments: [1]
2020-02-21_12:04:34.79376 rails-background-jobs : 2020-02-21T12:04:34.793Z 7103 TID-gonqjivcb Namespaces::RootStatisticsWorker JID-5340ca1e0781379c2328307a INFO: start
2020-02-21_12:04:34.80094 rails-background-jobs : 2020-02-21T12:04:34.800Z 7103 TID-gonqjiu2z Namespaces::ScheduleAggregationWorker JID-e5ca5703cb416ddac3353989 INFO: done: 0.422 sec
2020-02-21_12:04:34.98280 rails-background-jobs : 2020-02-21T12:04:34.982Z 7103 TID-gonqjivqf DetectRepositoryLanguagesWorker JID-8550d1f40c81fe026fb93d81 INFO: arguments: [24]
2020-02-21_12:04:34.99941 rails-background-jobs : 2020-02-21T12:04:34.999Z 7103 TID-gonqjiw67 ElasticCommitIndexerWorker JID-3bc17be78a94737b6b1b00e8 INFO: arguments: [24,"0000000000000000000000000000000000000000"]
2020-02-21_12:04:35.09593 rails-background-jobs : 2020-02-21T12:04:35.095Z 7103 TID-gonqjiw8r GitGarbageCollectWorker JID-5c1d62c6b49575a4d4e8dfb8 INFO: arguments: [24,"gc","project_housekeeping:24","611e2d06-741a-477c-9835-55f7da547824"]
2020-02-21_12:04:35.17865 rails-background-jobs : 2020-02-21T12:04:35.178Z 7103 TID-gonqjiv9r ProjectCacheWorker JID-1c2b95e0a515572948a84fc8 INFO: arguments: [24,[],["commit_count"]]
2020-02-21_12:04:35.19221 rails-background-jobs : 2020-02-21T12:04:35.192Z 7103 TID-gonqjivcb Namespaces::RootStatisticsWorker JID-5340ca1e0781379c2328307a INFO: arguments: [1]
2020-02-21_12:04:35.28664 rails-background-jobs : 2020-02-21T12:04:35.286Z 7103 TID-gonqjiw8r GitGarbageCollectWorker JID-5c1d62c6b49575a4d4e8dfb8 INFO: done: 0.778 sec
2020-02-21_12:04:35.29151 rails-background-jobs : 2020-02-21T12:04:35.291Z 7103 TID-gonqjivcb Namespaces::RootStatisticsWorker JID-5340ca1e0781379c2328307a INFO: done: 0.498 sec
2020-02-21_12:04:35.29634 rails-background-jobs : 2020-02-21T12:04:35.296Z 7103 TID-gonqjiv9r ProjectCacheWorker JID-1c2b95e0a515572948a84fc8 INFO: done: 0.689 sec
2020-02-21_12:04:35.67067 rails-background-jobs : 2020-02-21T12:04:35.670Z 7103 TID-gonqjiw67 ElasticCommitIndexerWorker JID-3bc17be78a94737b6b1b00e8 INFO: done: 1.179 sec
(Note how the ElasticCommitIndexerWorker
for the wiki runs earlier than the ProjectImportWorker
)
[3] pry(main)> Project.last.index_status
IndexStatus Load (1.1ms) SELECT "index_statuses".* FROM "index_statuses" WHERE "index_statuses"."project_id" = 24 LIMIT 1
=> #<IndexStatus:0x0000555dcdbdb7f0
id: 24,
project_id: 24,
indexed_at: Fri, 21 Feb 2020 12:04:35 UTC +00:00,
note: nil,
last_commit: "f15b32277d2c55c6c595845a87109b09c913c556",
created_at: Fri, 21 Feb 2020 12:04:26 UTC +00:00,
updated_at: Fri, 21 Feb 2020 12:04:35 UTC +00:00,
last_wiki_commit: "0000000000000000000000000000000000000000",
wiki_indexed_at: Fri, 21 Feb 2020 12:04:26 UTC +00:00>
After re-running the indexer manually:
[5] pry(main)> p.wiki.index_wiki_blobs
=> "cddfd6db279d7f994d206014"
[6] pry(main)> p.index_status.reload
IndexStatus Load (0.8ms) SELECT "index_statuses".* FROM "index_statuses" WHERE "index_statuses"."id" = 24 LIMIT 1
=> #<IndexStatus:0x0000555dcdbdb7f0
id: 24,
project_id: 24,
indexed_at: Fri, 21 Feb 2020 12:04:35 UTC +00:00,
note: nil,
last_commit: "f15b32277d2c55c6c595845a87109b09c913c556",
created_at: Fri, 21 Feb 2020 12:04:26 UTC +00:00,
updated_at: Fri, 21 Feb 2020 12:12:48 UTC +00:00,
last_wiki_commit: "19951957bfd4005aa30445dc78b3a2d50d9e31f2",
wiki_indexed_at: Fri, 21 Feb 2020 12:12:48 UTC +00:00>
The first push to the wiki repository after import would also cause it to be correctly indexed.
Steps to reproduce
- Export a project containing a wiki
- Import the project to a different namespace
- Search for some content in the wiki
What is the current bug behavior?
Imported wiki content is not searchable
What is the expected correct behavior?
Imported wiki content should be searchable
Output of checks
This bug happens on GitLab.com
Possible fixes
I think the ElasticIndexerWorker
that we schedule (and so the ElasticCommitIndexerWorkers
it schedules) are simply running too early, and the wiki repository has not been created yet.
To fix this, we should remove the reliance on scheduling ElasticCommitIndexerWorker
jobs in IndexRecordService
for project creation.
We already have a hook in ProjectImportState
to schedule an ElasticCommitIndexerWorker
for the project repository when we transition to finished
state - EE::ProjectImportState
- perhaps this is what's causing the project repository to be indexed as well, and the hooks in IndexRecordService
are truly worthless?
We could add "do the wiki as well, if it exists" to the ProjectImportState
hooks, but a complication is that they run for pull mirroring as well, which is undesirable since pull mirroring already enqueues an ElasticCommitIndexerWorker
via the post-receive hook. Adding a job for the wiki would be compounding that issue. Maybe there's a better way?