Introduce simple ActiveRecord-based bulk/insert functionality
Problem to solve
We have prior discussion about bulk inserts: #36992 (comment 271731371).
This is applicable to whole application, but specifically import
process:
- We insert a number of simple AR objects,
- We need to run the insert via AR object, due to validations,
- We insert them one-by-one, which makes the process slow
Example where bulk insert
would help:
-
MergeRequestDiffCommit
andMergeRequestDiffFile
: we can insert a few hundreds for a single relation, -
Notes
on issues and merge requests: as above, we can insert a few hundreds for a single issue and merge request.
We already do bulk insert
in some cases, but this is very specific implementation:
- GitHub Importer:
lib/gitlab/import/merge_request_helpers.rb
:insert_or_replace_git_data
.
Investigation
We tried in #36992 (closed) to create a AR-based low-level implementation
that would allow us to bulk_insert
data. However, this proven unrealistic, as it would require a heavy
patching of active record to follow the execution cycle: validations
+ callbacks
.
Proposal
Taken from: #36992 (comment 271151982)
We need something simpler, more targetted, fixing a specific relations.
Following my comment after !22783 (comment 271150808) I'm thinking that we could do something like this to have an automated way to perform bulk inserts, but done on a small scale, and targeting a specific relations ONLY:
What I'm really saying is that:
- If we disallow callbacks/validations on some models,
- We could gather them,
- We could bulk insert them, every some number of objects.
We could simply target a specific objects:
module WithBulkInsertableModels
def supports_bulk_insert?(reflection_name)
reflection = self.class.reflect_on_association(reflection_name)
reflection.reflection_class < BulkInsertable
end
def append_to_bulk_insert(reflection_name, items)
reflection = self.class.reflect_on_association(reflection_name)
raise 'Does not support bulk insert' unless reflection.reflection_class < BulkInsertable
@model_bulk_inserts ||= {}
@model_bulk_inserts[reflection] ||= []
@model_bulk_inserts[reflection] += items
end
after_save :bulk_insert
@model_bulk_inserts.each do |reflection_name, items|
reflection.reflection_class.bulk_insert(items)
end
@model_bulk_inserts = nil
end
end
module BulkInsert
# disallow before_save/after_save
# disallow before_validation
class_methods do
def bulk_insert(items)
...
end
end
end
class MergeRequestDiff
include WithBulkInsertableModels
has_many :merge_request_diff_commits
end
class MergeRequestDiffCommit
include BulkInsertable
end
class RelationTreeRestorer
def transform_sub_relations!(subject, data_hash, sub_relation_key, sub_relation_definition)
...
if subject.respond_to?(:supports_bulk_insert?) && subject.supports_bulk_insert?(sub_relation_key)
subject.append_to_bulk_insert(sub_relation_key, sub_data_hash)
data_hash.delete(sub_relation_key)
elsif sub_data_hash
data_hash[sub_relation_key] = sub_data_hash
else
data_hash.delete(sub_relation_key)
end
end
end
It gets quite simple and maintainable as a result:
- as we ensure that some of the Models cannot have a complex validations/callbacks,
- we ensure that we can raw-insert them, which make them safe to insert with that model,
- we can re-use that elsewhere if needed, we use it now only for
import/export
, - this can be our way to provide a consistent way to perform bulk insert across application in more structured manner.