Draft: GithubImporter: Refactoring representation layer
What does this MR do and why?
This is the first part of a bigger plan to improve the GithubImporter Representation layer.
Context
The GitHub importer uses a variation of the ETL architecture, where:
- Extractions happens on Importers (with pluralized resource names);
- Transformations happens in Representation layer;
- Loading happens in Importers (with singular resource names);
Work in this commit
Currently some transformations are not happening exclusively in the Representation objects, instead, some transformations are leaking to the Loading layer. This is happening because some transformations depends on the context of the import, like the project being imported.
-
This first step:
- creates
Gitlab::GithubImporter::Context
to pass the context to the representations. - moves the common API among the representations to the
Representation::Base
- adds more context to the representation to enable the Representation
to do all the transformations to the Loader layer. At this moment,
the project being imported and GitHub client being used, is being
passed to the Representation Class.
-
#initialize
, -
#parse
, -
#deserialize
, #github_identifiers
-
- rename
from_api_response
toparse
- rename
from_json_hash
todeserialize
- add
parse_with
to enable parse nested objects with the same context; - add
deserialize_with
to enable deserialize nested objects with the same context;
- creates
-
The next step is move the transformations that are happening out side the Representations layer back in to these classes.
-
Example of what can be removed from the Import (load) layer: !72458 (diffs)
Related to: #330331
Screenshots or screen recordings
Current architecture overview
sequenceDiagram
participant GithubAPI
participant Stage
participant Representation
participant ObjectImporter
Stage ->> GithubAPI: Fetch Collection
activate GithubAPI
GithubAPI ->> Stage: Collection of objects
deactivate GithubAPI
loop every object
Stage ->> Representation: from_api_response (serialize)
activate Representation
Representation ->> Stage: serialized object
deactivate Representation
Stage ->> ObjectImporter: execute (serialized object)
ObjectImporter ->> Representation: from_json_hash
activate Representation
Representation ->> ObjectImporter: deserialized object
deactivate Representation
Note right of ObjectImporter: At this point<br>the ObjectImporter<br>uses the deserialized object and some<br>transformations from the Representation<br>to build the attributes (more transformations) to<br>save the object on Gitlab
end
Proposed changed architecture overview
sequenceDiagram
participant GithubAPI
participant Stage
participant Representation
participant ObjectImporter
Stage ->> GithubAPI: Fetch Collection
activate GithubAPI
GithubAPI ->> Stage: Collection of objects
deactivate GithubAPI
loop every object
Stage ->> Representation: serialize
activate Representation
Representation ->> Stage: serialized object
deactivate Representation
Stage ->> ObjectImporter: execute (serialized object)
ObjectImporter ->> Representation: deserialize
activate Representation
Representation ->> ObjectImporter: deserialized object
deactivate Representation
Note right of ObjectImporter: Instead of having transformations on both<br>Representation and ObjectImporter<br>the end goal is to move all the<br>transformations to the Representer Layer
end
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.