Improve performance of Banzai::Filter::ReferenceFilter
Problem
From the flame graph
we can see that it takes 38.2s to load, and Banzai::Filter::ReferenceFilter#each_node
looked like the biggest offender (7.9-11.7s <-> 25-30% of time) and a good candidate to investigate.
It turned out that Banzai::Filter::ReferenceFilter#each_node
is creating and iterating 22346 Nokogiri::XML::Text
objects for this CHANGELOG.md file.
I discovered that there were a lot of empty lines, so with skipping empty lines, we reduced the number to 11098 for this particular file.
It turned out that we have various Banzai
filters, such as:
- Banzai::Filter::IterationReferenceFilter
- Banzai::Filter::UserReferenceFilter
- Banzai::Filter::ProjectReferenceFilter
- Banzai::Filter::IssueReferenceFilter
- Banzai::Filter::MergeRequestReferenceFilter
- Banzai::Filter::SnippetReferenceFilter
- Banzai::Filter::CommitRangeReferenceFilter
- Banzai::Filter::CommitReferenceFilter
We are parsing the same doc and filtering the same nodes over and over again (most of the time).
If these filters don't find any references and they pass to the next filter unchanged doc, that we are filtering again. In case when they actually replace the text with an actual link, doc changes, and those nodes
should reflect those changed nodes as well.
Example:
Let say that we have the following text:
!122 #12323 @root .`
This text will be parsed to
[<Nokogiri::XML::Text "!122 #1 @root .">]
ReferenceMergeRequestFilter
will parse the doc using each_node
and iterate through document searching for project_reference_pattern (!merge_request_id), and it will replace this node with 2 new nodes:
[<Nokogiri::XML::Element name="a" attributes=[<Nokogiri::XML::Attr name="href" value="../merge_requests/122">, ...>,
<Nokogiri::XML::Text " #12323 @root .">]
ReferenceIssuesFilter
will parse the doc using each_node
again and iterate through the document searching for issue_reference_pattern (#issue_id), and it will replace the new node:
[<Nokogiri::XML::Text " #12323 @root .">]
with:
[<Nokogiri::XML::Element name="a" attributes=[<Nokogiri::XML::Attr name="href" value="../issues/12323">, ...>,
<Nokogiri::XML::Text " @nmilojevic1 .">]
And the actual document will now look like this:
[ <Nokogiri::XML::Element name="a" attributes=[<Nokogiri::XML::Attr name="href" value="../merge_requests/122">, ...>,
<Nokogiri::XML::Element name="a" attributes=[<Nokogiri::XML::Attr name="href" value="../issues/12323">, ...>,
<Nokogiri::XML::Text " @nmilojevic1 .">]
ReferenceUserFilter
will parse the doc using each_node
again and iterate through the document searching for user reference pattern (@user_name
), and it will replace the new node:
[<Nokogiri::XML::Text " @nmilojevic1 .">]
with:
[<Nokogiri::XML::Element name="a" attributes=[<Nokogiri::XML::Attr name="href" value="../users/nmilojevic1">, ...>,
<Nokogiri::XML::Text " .">]
And the actual document will now look like this:
[ <Nokogiri::XML::Element name="a" attributes=[<Nokogiri::XML::Attr name="href" value="../merge_requests/122">, ...>,
<Nokogiri::XML::Element name="a" attributes=[<Nokogiri::XML::Attr name="href" value="../issues/12323">, ...>,
<Nokogiri::XML::Element name="a" attributes=[<Nokogiri::XML::Attr name="href" value="../users/nmilojevic1">,
<Nokogiri::XML::Text " .">]
When we parse the whole (big) doc each time using each_node
for each reference_filter
, it searches for text and unreferenced links through 22.000 lines, each time.
What does this MR tries to do?
When we replace the node, we update the list with replaced node (keeping text and omitting referenced links), and we pass that updated node list to the next filter.
Feature Flags
This optimization is under the feature flag:
-
:update_nodes_for_banzai_reference_filter
(with project scope)
Rollout plan
- We should enable Feature Flage for a few projects and test to make sure it works.
- Enable FF for
Gitlab
project - Prepare MR to enable if for default for everyone
- Update documentation
- Prepare MR to remove feature flag
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
Tested with Gitlabhq
project, CHANGELOG.md
file
https://staging.gitlab.com/qa-perf-testing/gitlabhq/-/preview/master/CHANGELOG.md
master branch
217580-improve-performance-of-blob branch
branch | method | time | total time |
---|---|---|---|
master | Banzai::Filter::ReferenceFilter#each_node | 7.9s | 34.7s |
origin/217580-improve-performance-of-blob | Banzai::Filter::ReferenceFilter#each_node | 0.574s | 19.82s |
In this particular case, this File is loading 75% faster than before.
Banzai::Filter::ReferenceFilter#each_node
is 13x faster then before
I tested File preview as well:
branch | total time s | Banzai::Filter::ReferenceFilter#each_node |
---|---|---|
master | 27.6s | 6.00s (22%) |
217580-improve-performance-of-blob | 15.53s | 0.370s (2.4%) |
We can see similar improvements as we have with blob controller.
The issue preview is affected as well.
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of default or new setting change, if applicable per definition of done
Related to #217580 (closed)