Improve memory performance by reducing dirty pages before fork
The nakayoshi_fork gem works around the lack of a Ruby compacting garbage collecting by calling GC.start a few times to "promote" the age of young objects to the maximum age 3. This reduces the number of dirty pages and consequently improves copy-on-write behavior with unicorn and other forking Web application servers.
See Aaron Patterson's 2017 RubyConf talk on this in https://www.youtube.com/watch?v=8Q7M513vewk&feature=youtu.be&t=960.
See source code: https://github.com/ko1/nakayoshi_fork/blob/master/lib/nakayoshi_fork.rb
https://github.com/discourse/discourse/blob/master/script/memstats.rb is a good script to aggregate PSS instead of RSS per process. I've noticed some encouraging results with these changes. For example, on my test instance PSS is about half of RSS:
$ sudo ruby memstats.rb 7025
Process: 7025
Command Line: unicorn worker[1] -D -E production -c /var/opt/gitlab/gitlab-rails/etc/unicorn.rb /opt/gitlab/embedded/service/gitlab-rails/config.ru
Memory Summary:
private_clean 0 kB
private_dirty 120,096 kB
pss 219,844 kB
rss 426,040 kB
shared_clean 10,268 kB
shared_dirty 295,676 kB
size 771,432 kB
swap 0 kB
shared_dirty
tends to decrease as the OS copies more dirty pages to the forked process. Compare this to a unicorn worker on dev, which shows PSS approaching RSS:
$ sudo ruby memstats.rb 7680
Process: 7680
Command Line: unicorn worker[1] -D -E production -c /var/opt/gitlab/gitlab-rails/etc/unicorn.rb /opt/gitlab/embedded/service/gitlab-rails/config.ru
Memory Summary:
private_clean 424 kB
private_dirty 476,896 kB
pss 488,507 kB
rss 592,648 kB
shared_clean 17,592 kB
shared_dirty 97,736 kB
size 1,111,464 kB
swap 37,392 kB