Memory leak: Repository objects not destroyed after request finished
I noticed that doing a request for a large diff always causes the unicorn memory killer to restart unicorn only after a single request.
After doing a number of tests, I'm convinced something is keeping a reference to the Repository
objects, which prevents destruction of the Rugged::Repository
object and manifests in a potentially massive memory leak with libgit2.
libgit2 has a cache that by default holds up to 256 MB per Rugged::Repository
object (https://github.com/libgit2/libgit2/blob/895fd51a434ac50ebd53e996bd9149302ca3e134/src/cache.c#L19
). It gets cleaned up after Rugged::Repository
is destroyed or Rugged::Repository#close
is called:
https://github.com/libgit2/libgit2/blob/895fd51a434ac50ebd53e996bd9149302ca3e134/src/repository.c#L163.
Note that libgit2 also memory-maps pack files, so not closing the repository also causes memory leaks.
To experiment whether the cache was ever being freed, I made this change in Rugged 0.26.0:
diff --git a/src/repository.c b/src/repository.c
index fe549e6e8..b36190e3a 100644
--- a/src/repository.c
+++ b/src/repository.c
@@ -151,10 +151,17 @@ void git_repository__cleanup(git_repository *repo)
set_refdb(repo, NULL);
}
+#include <time.h>
void git_repository_free(git_repository *repo)
{
size_t i;
+ time_t t = time(NULL);
+ struct tm *local;
+ local = gmtime(&t);
+
+ fprintf(stderr, "%s git_repository_free: 0x%x\n", asctime(local), repo);
+
if (repo == NULL)
return;
Then I ran gem build rugged.gemspec
and sudo /opt/gitlab/embedded/bin/gem install rugged-0.26.0.gem
on an Omnibus installation.
To see that this really works, try:
# /opt/gitlab/embedded/bin/irb
irb(main):001:0> require 'rugged'
=> true
irb(main):002:0> repo = Rugged::Repository.new('/tmp/foobar.git')
=> #<Rugged::Repository:18147660 {path: "/tmp/foobar.git/"}>
irb(main):003:0> repo = nil
=> nil
irb(main):004:0> GC.start
Tue Jan 30 05:44:25 2018
git_repository_free: 0x2656c90
=> nil
Now, if you use the Rails console (e.g. run app.get '/root/gitlab-ce-test/merge_requests/1/diffs.json
) or issue a curl request, you will see that this message ONLY appears if the unicorn memory killer kicks in:
I, [2018-01-30T05:44:57.711038 #5684] INFO -- : master process ready
I, [2018-01-30T05:44:57.794081 #5719] INFO -- : worker=0 ready
W, [2018-01-30T05:48:32.017070 #5719] WARN -- : #<Unicorn::HttpServer:0x00007f9fd0197db0>: worker (pid: 5719) exceeds memory limit (775492608.0 bytes > 629261760 bytes)
W, [2018-01-30T05:48:32.017214 #5719] WARN -- : Unicorn::WorkerKiller send SIGQUIT (pid: 5719) alive: 210 sec (trial 1)
Tue Jan 30 05:48:32 2018
git_repository_free: 0xb11c0ac0
I, [2018-01-30T05:48:32.201475 #5684] INFO -- : reaped #<Process::Status: pid 5719 exit 0> worker=0
I, [2018-01-30T05:48:32.301201 #6239] INFO -- : worker=0 ready
When I disabled the unicorn memory killer (via config.ru
), this debug statement never shows.
In the Rails console, you can enable ObjectSpace debugging and confirm that the objects live even after a GC:
irb(main):046:0> ObjectSpace.each_object(Repository) { |obj| puts obj }
#<Repository:0x00007fd891d2aa20>
#<Repository:0x00007fd8943bff28>
#<Repository:0x00007fd8a89a6b30>
=> 3
If you dump the heap to disk, you can also find the references to these objects. I wrote this script:
require 'json'
def scan_for_refs(address)
puts "scanning for #{address}"
input = File.open('/tmp/heap.dump').each_line do |line|
data = JSON.parse(line)
if data['references']&.include?(address)
puts data
end
end
end
scan_for_refs(ARGV[0])
I think I've narrowed the problem down to this method:
$ ruby parse.rb 0x7fd8943bff28 | grep -v "diff/file_coll"
scanning for 0x7fd8943bff28
{"address"=>"0x7fd8a670a810", "type"=>"OBJECT", "class"=>"0x7fd8ab7ef208", "ivars"=>22, "references"=>["0x7fd8a670a8b0", "0x7fd8a670a748", "0x7fd8a670a720", "0x7fd8a670a6d0", "0x7fd8a6b87c30", "0x7fd897f5c018", "0x7fd8943bff78", "0x7fd8943bff28", "0x7fd8ab15b248", "0x7fd890da5870"], "file"=>"/opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/activerecord-4.2.10/lib/active_record/core.rb", "line"=>114, "method"=>"allocate", "generation"=>68, "memsize"=>216, "flags"=>{"wb_protected"=>true, "old"=>true, "uncollectible"=>true, "marked"=>true}}
/opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/activerecord-4.2.10/lib/active_record/core.rb
line 114 is:
I am puzzled why this particular method may be causing the problem at the moment. Anyone have any ideas?
/cc: @godfat, @smcgivern, @DouweM, @yorickpeterse