Add caching to BitBucket Server importer for pull requests
What does this MR do?
The BitBucket Server Import is enhanced with Caching functionality, this ensures, that already imported Pull Requests will not be imported or attempted to be imported again. This is crucial as the SidekiqMemoryKiller can interrupt the process and it will start from scratch again.
This can be seen as a performance improvement written in #23586 (closed)
Summary
Bitbucket Server Import is running into a timeout. This is caused by two factors:
- The import itself does not have any caching, so if it is restarted, it will start from the beginning
- SideKiqMemoryKiller will kill the process at some time, if there are too many pull requests to import
Steps to reproduce
- start a bitbucket import of a big project (for gitlab.com it was enough to have about 1000 Pull Requests, for a self hosted gitlab we needed around 4000 Pull requests)
- start the import, at some time, SideKiqMemoryKiller will kill the process, and it will start all over again
Example Project
I can only provide you a already failed project, with a timed out import - Project id: 21406002
What is the current bug behavior?
The import will not succeed, as it will retry the process over and over again. As it will start always from the beginning, it will always be killed at the same time somehow.
What is the expected correct behavior?
The import should not start at the first pull request again, it should continue at the last imported one. Or at least should ignore the already imported once, and also should not fetch them via REST API again.
Relevant logs and/or screenshots
{"severity":"WARN","time":"2020-10-20T14:22:46.921Z","message":"Terminating 1 busy worker threads"}
{"severity":"WARN","time":"2020-10-20T14:22:46.921Z","message":"Work still in progress [#\u003cstruct Sidekiq::BaseReliableFetch::UnitOfWork queue=\"queue:repository_import\", job=\"{\\\"class\\\":\\\"RepositoryImportWorker\\\",\\\"args\\\":[3],\\\"retry\\\":false,\\\"queue\\\":\\\"repository_import\\\",\\\"version\\\":0,\\\"backtrace\\\":5,\\\"status_expiration\\\":54000,\\\"memory_killer_memory_growth_kb\\\":50,\\\"memory_killer_max_memory_growth_kb\\\":300000,\\\"jid\\\":\\\"c22e2e12cfe635882785f202\\\",\\\"created_at\\\":1603196305.8484204,\\\"meta.user\\\":\\\"root\\\",\\\"meta.project\\\":\\\"gitlab-instance-957cb02c/xxxl\\\",\\\"meta.root_namespace\\\":\\\"gitlab-instance-957cb02c\\\",\\\"meta.caller_id\\\":\\\"Import::BitbucketServerController#create\\\",\\\"meta.related_class\\\":\\\"Projects::CreateService\\\",\\\"correlation_id\\\":\\\"3vNAgKXYuz4\\\",\\\"enqueued_at\\\":1603196305.8506517,\\\"interrupted_count\\\":1}\"\u003e]"}
{"severity":"INFO","time":"2020-10-20T14:22:46.922Z","message":"Pushed job c22e2e12cfe635882785f202 back to queue queue:repository_import","jid":"c22e2e12cfe635882785f202","queue":"queue:repository_import","retry":0}
{"severity":"INFO","time":"2020-10-20T14:22:46.946Z","message":"Bye!"}
{"severity":"WARN","time":"2020-10-20T14:22:46.947Z","class":"Gitlab::SidekiqDaemon::MemoryKiller","action":"stop","pid":14815,"message":"Stopping Gitlab::SidekiqDaemon::MemoryKiller Daemon","retry":0}
{"severity":"INFO","time":"2020-10-20T14:22:48.054Z","message":"A worker terminated, shutting down the cluster"}
{"severity":"INFO","time":"2020-10-20T14:22:48.245Z","message":"Starting cluster with 1 processes"}
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
As i am not really familiar with ruby, i could not provide sufficient unit or integration tests for this merge request but i am eager to learn. If put into the right direction, and with guidance. But i assume doing a test with redis in particular, will not be an easy task to achieve.
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.