Draft: PoC - extract CI into component
What does this MR do and why?
Related to #365293 (closed)
This MR attempts to prove that GitLab codebase could be better organized into domain components by following the Domain-Driven Design principles.
Rather than having the codebase organized primarily as a layered architecture (Model-View-Controller), the large monolith could be organized into components representing the functional domains of the application. Within each component (or bounded context) we retain the directory structure of a typical Rails application (models
, services
, workers
, etc.) with some tweaks:
- Each component contains a
**/app/lib
folder where domain logic that doesn't fit the MVC pattern can go. This stops us from putting unorganized domain code into the top-levellib/
folder. - Controller, Frontend and API code remains in the main
app/
for now. We could decide to move those into the respective component. Example: each component defines the own REST endpoints and GraphQL types which are ultimately mounted all together. - Database migrations remain in the main
db/
directory. Again, we could decide to have each component hosting its own migrations (components/ci/core/db/migrate/*
).
This PoC also uses Packwerk to enforce boundaries between components, otherwise there wouldn't be much benefit. Packwerk doesn't give a proper isolation since all Ruby code eventually is loaded in memory. It doesn't stop a component from referencing a constant from another component at runtime. However, it's a powerful static analysis tool that can be used in CI to enforce boundaries between components, helping us to design more decoupled domains.
The directory structure looks like the example below. Some points to highlight:
- each component
components/<name>
directory contains apackage.yml
containing Packwerk package configs. - Each component directory has a
core
andee
directory. FOSS code will be located undercomponents/*/core
and EE extensions undercomponents/*/ee
.-
core
is always autoloaded. -
ee
is loaded conditionally if running EE. - This PoC doesn't include
jh
in the scope but we can use the identical approach asee
. -
Why this choice? Code inside
core
andee
that is part of the same package (e.g.Ci::
) should be colocated under the same package directory. With Packwerk it's not possible (thankfully) to have packages scattered around multiple directories. This makes all code related to a domain under the same directory.
-
- Each component directory includes the relative
spec
s. - Packwerk provides a way to make certain constants public for other packages/components to use. For example
Ci::CreatePipelineService
could be used by other domains to trigger a pipeline. Packwerk requires files declaring these constants to be placed into a specialpublic
directory:components/ci/core/public/services/ci/create_pipeline_service.rb
. Anything outside thepublic
directory is considered private to the component.
components
└── ci
├── package.yml
├── public
│ ├── core
│ │ └── services
│ │ └── ci/create_pipeline_service.rb
│ └── ee
├── core # autoload components/*/core by default
│ ├── app
│ │ ├── models
│ │ ├── lib
│ │ ├── services
│ │ └── ...
│ └── spec
│ ├── models
│ ├── lib
│ ├── services
│ └── ...
└── ee # autoload components/*/ee if running EE
├── app
│ ├── models
│ │ ├── ci/minutes/usage.rb # class defined in EE
│ │ └── ee/ci/pipeline.rb # EE extension for a class in Core
│ ├── lib
│ ├── services
│ └── ...
└── spec
├── models
├── lib
├── services
└── ...
Changes made in this MR:
- I simply moved part of CI files (mainly domain logic excluding controllers, frontend, API) into the directory structure above
-
app/models/ci -> components/ci/core/app/models/ci
- and so on for services, workers, policies, presenters, etc. -
ee/app/models/ci -> components/ci/ee/app/models/ci
- and so on for services, workers, policies, presenters, etc. -
ee/app/models/ee/ci -> components/ci/ee/app/models/ee/ci
(EE extension modules) - an so on for services, workers, policies, presenters, etc. - some files (but not all) belonging to CI domain but not inside the
Ci::
namespace were also moved into the component dir:app/models/commit_status.rb -> components/ci/core/app/models/commit_status.rb
. - I did NOT yet move domain code from
lib/gitlab/ci
intocomponents/ci/core/app/lib/ci
. It could be moved insidecomponents/ci/core/app/lib/gitlab/ci
in order to maintain theGitlab::Ci::
namespace but it would make more sense to remove theGitlab::
namespace as we move these files instead.
-
- Generated
packerk.yml
andpackage.yml
config files with some tuning. - Generated the
deprecated_references.yml
files which are the equivalent of Rubocop TODO files and record existing violations so that Packwerk don't warn about these again. - Changed
config/application.rb
to autoload files from inside the component and conditionally load EE files. - Changed
lib/gitlab/sidekiq_config.rb
to recognize the new location of the CI workers and any other new components.
TODO:
-
add components/ci/public/core/*
andcomponents/ci/public/ee/*
to the autoload paths. -
fix issues with knapsack not running some specs -
fix Rubocop todo files (violations) since we moved a lot of files around and those are now flagged. Running rails rubocop:todo:generate
however creates too many (including unrelated) changes. So this is whyrubocop
job fails. -
the static analysis CI job complains about the need to run rails gettext:regenerate
to updatelocale/gitlab.pot
. I think there is a configuration somewhere that doesn't recognize the moved files.
Advantages
Code architecture
- The whole
Ci::
namespace is a single package which we can control how it is used. - Package dependencies will be explicitly defined in
components/ci/package.yml
. - The
package.yml
defines also who are the teams primarily responsible for the stewardship of the package. - Production code and specs are both located into the same package. This will give developers a much smaller context to work on while also benefiting from the privacy and dependency constraints that Packwerk provides.
- Public interface classes of the package are exposed via the
public
folder. We clearly distinguish public classes that can be depended on vs implementation details private to the package. This is a Packwerk convention and the idea is that the smaller the number of public classes the more cohesive and more resilient to external changes the package is. Ideally this should contain a single class (e.g.Ci::API.get_pipeline(id)
) that works as internal API. - This way all classes inside
Ci::
component can depend on each other. Eventually sub-packages could be created to better define internal boundaries. - It's possible to put into the same component files that don't belong to the same Ruby module (e.g.
CommitStatus
can be moved intocomponents/ci
). While this is not ideal, it provides a flexible way to group files that should belong to the same domain. Then we can ensure they both are nested under the same Ruby module.
Iteration
- This can be done in several iterations. Create an empty component and gradually move all files in it. Then use Packwerk to enforce privacy and dependencies.
- Code that is clearly defined into a namespace can be converted to a package and moved to the respective folder.
- We can start seeing the advantages immediately by recording the existing violations for us to resolve later but also be alerted about new violations as we make changes going forward.
Challenges
- We need to understand how this would play with the proposal of separating out the
web_engine
. Put controllers/GraphQL/REST/serializers in aweb_engine
vs in dedicatedcomponent/*
folder. - New directory structure for developers to get used to.
- Could mess Git history by moving many files - We already experienced that a few years ago when we consolidated Core and EE codebase into a single repo.
- Many iterations to get to a consistent state across the codebase.
- Requires adoption from all engineering teams.
- CODEOWNERS, Danger bot and other tools will need to be updated.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.