Improve performance and (memory usage (a little)) of project export
What does this MR do?
Based and targeting the kamil-refactor-import-structure
https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/32704
ActiveModel::Serialization is simple in that it recursively calls
as_json
on each object to serialize everything. However, for a model
like a Project, this can generate a query for every single association,
which can add up to tens of thousands of queries and lead to memory
bloat.
To improve this, we can do several things:
-
We use
tree:
andpreload:
to automatically generate a list of all preloads that could be used to serialize objects in bulk. -
We observe that a single project has many issues, merge requests, etc. Instead of serializing everything at once, which could lead to database timeouts and high memory usage, we take each top-level association and serialize the data in batches.
For example, we serialize the first 100 issues and preload all of their associated events, notes, etc. before moving onto the next batch. When we're done, we serialize merge requests in the same way. We repeat this pattern for the remaining associations specified in import_export.yml.
Started by @stanhu in https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/23757, with joint effort with @ayufan and @alipniagov.
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry for user-facing changes, or community contribution. Check the link for other scenarios. -
Documentation created/updated or follow-up review issue created -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides
Related to https://gitlab.com/gitlab-org/gitlab-ce/issues/35389