Make project export to be of constant memory
Building the hash of project export in memory is very expensive operation.
We can significantly improve it by serializing and writing to JSON at the same time
instead of storing Hash
representation in memory.
This allows us to quickly allocate and quickly free memory used by export, instead of retaining it for longer period of time.
Original description
It seems that the biggest cost of
export
is keeping in-memory structure of hash. When I tested the hash size I noticed this to take250MB
forgitlabhq
. This is first step, then during save, we effectively duplicate and rewrite that hash. Then we serialize this hash to string. https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/32423/diffs?commit_id=a33994318c3212a49338553e83bc4709271648a6 > this makes to be serializer almost of constant size, except the memory growth related to storingstring
withjson
. We serialize small chunks, and forget the hash.
I was able to get down from 1.2G after export to 510MB. The memory allocated did hover around 600MB during the export.
But so far it seems that we can:
- Shorten export by about 4x times with N+1 fix,
- We can reduce memory footprint (for gitlabhq) from around 700MB (on top of process), to around 150MB.
Related to: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/32579
Confirmed by @alipniagov https://gitlab.com/gitlab-org/gitlab-ce/issues/35389#note_211235853.