Reduce object allocations for large merge request
🧩 What does this MR do?
Reduce object allocations when processing commits from Gitaly. During the investigation of #281574, it was noted that to process a large MR with 13,796 commits, the enrich!
method creates a lot of allocations.
I have attached the full profiling of this MR on staging: profiling-large-commit-gitaly.txt
⚙ Implementation
A more trivial example.
Given we would like to convert the following data structure
raw_commits = [
[0] OpenStruct {
:id => 1,
:name => "tom-1"
},
[1] OpenStruct {
:id => 2,
:name => "tom-2"
},
[2] OpenStruct {
:id => 3,
:name => "tom-3"
},
[3] OpenStruct {
:id => 4,
:name => "tom-4"
},
[4] OpenStruct {
:id => 5,
:name => "tom-5"
}
]
to
enriched_commits = {
1 => "tom-1",
2 => "tom-2",
3 => "tom-3",
4 => "tom-4",
5 => "tom-5"
}
# Existing approach
enriched_commits = Hash[raw_commits.map { |o| [o.id, o.name] }].compact
# New approach
enriched_commits = raw_commits.each_with_object({}) { |o, result| result[o.id] = o.name }.compact
We can generate a really big array
raw_commits = (1..10000).map { |n| OpenStruct.new(id: n, name: "tom-#{n}") }
Before
Total allocations: 10003
pry(main)> profile = RubyProf.profile { enriched_commits = Hash[raw_commits.map { |o| [o.id, o.name] }].compact }
pry(main)> RubyProf::FlatPrinter.new(profile).print(STDOUT, min_percent: 2)
Measure Mode: allocations
Thread ID: 23720
Fiber ID: 86880
Total: 10003.000000
Sort by: self_time
%self total self wait child calls name location
99.98 10001.000 10001.000 0.000 0.000 1 Array#map
* recursively called methods
Columns are:
%self - The percentage of time spent in this method, derived from self_time/total_time.
total - The time spent in this method and its children.
self - The time spent in this method.
wait - The amount of time this method waited for other threads.
child - The time spent in this method's children.
calls - The number of times this method was called.
name - The name of the method.
location - The location of the method.
The interpretation of method names is:
* MyObject#test - An instance method "test" of the class "MyObject"
* <Object:MyObject>#test - The <> characters indicate a method on a singleton class.
[
[0] #<RubyProf::Thread:0x00007f99b9070238>
]
Total time: 0.001640 secs
Measure Mode: wall_time
Thread ID: 23720
Fiber ID: 86880
Total: 0.001640
Sort by: self_time
%self total self wait child calls name location
7.76 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#id /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
7.03 0.000 0.000 0.000 0.000 1 <Class::Hash>#[]
6.47 0.002 0.000 0.000 0.002 1 [global]# (pry):185
6.41 0.001 0.000 0.000 0.001 1 Array#map
5.44 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#id /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
5.28 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#id /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
5.25 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#name /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
5.09 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#id /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
4.73 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#name /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
4.34 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#id /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
4.22 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#name /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
4.15 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#name /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
3.56 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#name /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
3.32 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#name /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
3.28 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#name /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
3.19 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#id /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
2.88 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#name /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
2.78 0.000 0.000 0.000 0.000 1 Hash#compact
2.78 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#id /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
2.64 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#name /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
2.52 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#name /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
2.36 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#id /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
2.32 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#id /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
2.21 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#id /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
After
Total allocations: 4
pry(main)> profile = RubyProf.profile { enriched_commits = raw_commits.each_with_object({}) { |o, result| result[o.id] = o.name }.compact }
pry(main)> RubyProf::FlatPrinter.new(profile).print(STDOUT, min_percent: 2)
Measure Mode: allocations
Thread ID: 23720
Fiber ID: 86880
Total: 4.000000
Sort by: self_time
%self total self wait child calls name location
75.00 3.000 3.000 0.000 0.000 1 BasicObject#method_missing
25.00 4.000 1.000 0.000 3.000 1 [global]# (pry):128
* recursively called methods
Columns are:
%self - The percentage of time spent in this method, derived from self_time/total_time.
total - The time spent in this method and its children.
self - The time spent in this method.
wait - The amount of time this method waited for other threads.
child - The time spent in this method's children.
calls - The number of times this method was called.
name - The name of the method.
location - The location of the method.
The interpretation of method names is:
* MyObject#test - An instance method "test" of the class "MyObject"
* <Object:MyObject>#test - The <> characters indicate a method on a singleton class.
[
[0] #<RubyProf::Thread:0x00007f99fe534180>]
Total time: 0.000545 secs
Measure Mode: wall_time
Thread ID: 23720
Fiber ID: 86880
Total: 0.000545
Sort by: self_time
%self total self wait child calls name location
13.07 0.000 0.000 0.000 0.000 1 Array#each
9.57 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#name /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
7.37 0.000 0.000 0.000 0.000 1 Hash#compact
6.32 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#name /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
5.31 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#name /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
5.05 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#name /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
4.78 0.000 0.000 0.000 0.000 1 Enumerable#each_with_object
4.19 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#id /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
4.18 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#name /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
4.07 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#id /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
3.97 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#id /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
3.89 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#id /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
3.78 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#id /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
3.67 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#id /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
2.76 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#id /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
2.48 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#id /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
2.44 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#id /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
2.32 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#name /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
2.29 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#name /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
2.28 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#name /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
2.14 0.000 0.000 0.000 0.000 1 <Object::OpenStruct>#name /Users/tanle/.rubies/2.7.2/lib/ruby/2.7.0/ostruct.rb:190
2.06 0.001 0.000 0.000 0.001 1 [global]# (pry):187
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers - [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Tan Le