Draft: POC - Use Packwerk to create `components/ci`
What does this MR do and why?
Related to Related to #365293 (closed)
This MR provides a more iterative approach than !88899 (closed).
In this MR we attempt to move only Gitlab::Ci::Config
code into components/ci
and run Packwerk to see the violations.
Overview
This MR attempts to prove that GitLab codebase could be better organized into domain components by following the Domain-Driven Design principles.
Rather than having the codebase organized primarily as a layered architecture (Model-View-Controller), the large monolith could be organized into components representing the functional domains of the application. Within each component (or bounded context) we retain the directory structure of a typical Rails application (models
, services
, workers
, etc.) with some tweaks:
- Each component contains a
**/app/lib
folder where domain logic that doesn't fit the MVC pattern can go. This stops us from putting unorganized domain code into the top-levellib/
folder. - Controller, Frontend and API code remains in the main
app/
for now. We could decide to move those into the respective component. Example: each component defines the own REST endpoints and GraphQL types which are ultimately mounted all together. - Database migrations remain in the main
db/
directory. Again, we could decide to have each component hosting its own migrations (components/ci/core/db/migrate/*
).
This PoC also uses Packwerk to enforce boundaries between components, otherwise there wouldn't be much benefit. Packwerk doesn't give a proper isolation since all Ruby code eventually is loaded in memory. It doesn't stop a component from referencing a constant from another component at runtime. However, it's a powerful static analysis tool that can be used in CI to enforce boundaries between components, helping us to design more decoupled domains.
The directory structure looks like the example below. Some points to highlight:
- each component
components/<name>
directory contains apackage.yml
containing Packwerk package configs. - Each component directory has a
core
andee
directory. FOSS code will be located undercomponents/*/core
and EE extensions undercomponents/*/ee
.-
core
is always autoloaded. -
ee
is loaded conditionally if running EE. - This PoC doesn't include
jh
in the scope but we can use the identical approach asee
. -
Why this choice? Code inside
core
andee
that is part of the same package (e.g.Ci::
) should be colocated under the same package directory. With Packwerk it's not possible (thankfully) to have packages scattered around multiple directories. This makes all code related to a domain under the same directory.
-
- Each component directory includes the relative
spec
s. - Packwerk provides a way to make certain constants public for other packages/components to use. For example
Ci::CreatePipelineService
could be used by other domains to trigger a pipeline. Packwerk requires files declaring these constants to be placed into a specialpublic
directory:components/ci/public/core/services/ci/create_pipeline_service.rb
. Anything outside thepublic
directory is considered private to the component.
components
└── ci
├── package.yml
├── public
│ ├── core
│ │ └── services
│ │ └── ci/create_pipeline_service.rb
│ └── ee
├── core # autoload components/*/core by default
│ ├── app
│ │ ├── models
│ │ ├── lib
│ │ ├── services
│ │ └── ...
│ └── spec
│ ├── models
│ ├── lib
│ ├── services
│ └── ...
└── ee # autoload components/*/ee if running EE
├── app
│ ├── models
│ │ ├── ci/minutes/usage.rb # class defined in EE
│ │ └── ee/ci/pipeline.rb # EE extension for a class in Core
│ ├── lib
│ ├── services
│ └── ...
└── spec
├── models
├── lib
├── services
└── ...
Changes made in this MR:
- I simply moved part of CI files into the directory structure above. In this MR we attempt to move only
Gitlab::Ci::Config
code intocomponents/ci/{core|ee}/app/lib/gitlab/ci/
and run Packwerk to see the violations. - Other
Ci::
files are left in the main/outer component (.
) - Generated
packerk.yml
andpackage.yml
config files with some tuning. - Generated the
deprecated_references.yml
files which are the equivalent of Rubocop TODO files and record existing violations so that Packwerk don't warn about these again. - Changed
config/application.rb
to autoload files from inside the component and conditionally load EE files. - Changed
lib/gitlab/sidekiq_config.rb
to recognize the new location of the CI workers and any other new components.
Not done:
- add
components/ci/public/core/*
andcomponents/ci/public/ee/*
to the autoload paths. - fix issues with knapsack not running some specs
- fix Rubocop todo files (violations) since we moved a lot of files around and those are now flagged. Running
rails rubocop:todo:generate
however creates too many (including unrelated) changes. So this is whyrubocop
job fails. - the static analysis CI job complains about the need to run
rails gettext:regenerate
to updatelocale/gitlab.pot
. I think there is a configuration somewhere that doesn't recognize the moved files.
Advantages
Code architecture
- The whole
Ci::
namespace is a single package which we can control how it is used. - Package dependencies will be explicitly defined in
components/ci/package.yml
. - The
package.yml
defines also who are the teams primarily responsible for the stewardship of the package. - Production code and specs are both located into the same package. This will give developers a much smaller context to work on while also benefiting from the privacy and dependency constraints that Packwerk provides.
- This way all classes inside
Ci::
component can depend on each other. Eventually sub-packages could be created to better define internal boundaries. - It's possible to put into the same component files that don't belong to the same Ruby module (e.g.
CommitStatus
can be moved intocomponents/ci
). While this is not ideal, it provides a flexible way to group files that should belong to the same domain. Then we can ensure they both are nested under the same Ruby module.
Iteration
- This can be done in several iterations. Create an empty component and gradually move all files in it. Then use Packwerk to enforce privacy and dependencies.
- Code that is clearly defined into a namespace can be converted to a package and moved to the respective folder.
- We can start seeing the advantages immediately by recording the existing violations for us to resolve later but also be alerted about new violations as we make changes going forward.
Challenges
- We need to understand how this would play with the proposal of separating out the
web_engine
. Put controllers/GraphQL/REST/serializers in aweb_engine
vs in dedicatedcomponent/*
folder. - New directory structure for developers to get used to.
- Could mess Git history by moving many files - We already experienced that a few years ago when we consolidated Core and EE codebase into a single repo.
- Many iterations to get to a consistent state across the codebase.
- Requires adoption from all engineering teams.
- CODEOWNERS, Danger bot and other tools will need to be updated. Although I believe that rules for ownership will be simplified in a component-based monolith than hard-coding paths as we do today in CODEOWNERS.
Dependency graph
In this MR we are only extracting components/ci
and in order to do it incrementally we need to have a cyclical dependency with the main/parent component that is the Rails root directory. This is because Ci::
needs to reference external constants like Project
, MergeRequest
, etc.
flowchart LR
root[Parent component] -- depends on --> CI
CI -- depends on --> root
Overtime we can end up with:
flowchart LR
CI --> w[Project]
w --> Platform
CI --> Platform
MergeRequests --> CI
MergeRequests --> w
w --> Repository --> Platform
MergeRequests --> Platform