Draft: PoC: Pods Stateless Router Proposal (Iteration 2) (!102769) · Merge requests · GitLab.org / GitLab

Kamil Trzciński (Back 2025-01-01) requested to merge pod-stateless-router-iteration-2 into pod-stateless-router-iteration-1 Nov 02, 2022

TL;DR

This is a proof of concept trying to model some aspects Pods Stateless Router Proposal as described in !102553 (merged).

What it does?

Iteration 1 (concluded with recording on 12th of October)

Look here !102770 (closed).

Iteration 2 (this MR, in progress)

Those are additional changes implemented in this PoC to continue solution validation. Yet to be presented.

Move away from Pod selector on Performance Bar: The intent is to make as many routes to be classified to a given Pod automatically. Performance Bar would show in such case which Pod is used, but in most cases the ability to change Pod does stop working.
Router: Implement router to send pre-flight requests and path_info classification on Rails side: !102553 (merged).
GitLab-Shell and Gitaly: Fix support for Git Push and make it work with Router.
PreventClusterWrites: Implement mechanism to model async writes approach: where only Pod 0 can write to cluster-wide tables.
QA: Work on fixing as many QA tests as possible.

This follows write-async approach: where only Pod 0 can write to cluster-wide tables.

A pod_N always uses DB replica of cluster-wide tables and is expecting to observe latency on those tables to up-to 500ms => a region-first approach
A set of cluster-wide tables is under public schema
A pod specific tables is under pod_0 and pod_1 schema
A Rack/Sidekiq Middleware is added to configure connection.schema_search_path = "public,pod_0|pod_1" depending on selected_pod Cookie to model switching organizations
Only pod_0 can write to cluster-wide tables: this is enforced by PreventClusterWrites Query Analyzer
The pod_N forwards write calls via API to pod_0: this is currently modelled by suppressing PreventClusterWrites in a place where the write happens to identify places required to be changed
Some endpoints are forced by served by pod_0, like /admin, or /-/profile that require cluster-wide access.

(click image to expand)

What problems it ignores as we know that we can solve them?

Decompose cluster-wide tables: We know that we can decompose all cluster-wide tables (as we did that for CI decomposition). The biggest problem there is fixing all cross-join schemas. Using a single logical database with separate PostgreSQL schemas (cluster+pod_0 or cluster+pod_1) allows to retain all existing cross-joins working, but still create a separate visibility between tables.
Monotonic sequences: We know that we can handle ID sequences across all Pods in a non-conflicting way for things like projects.id or issues.id. This PoC makes all PostrgreSQL sequences to be shared across all pod_0/pod_1.
Loose foreign keys: The loose foreign keys needs to be updated to allow removal across different Pods
Partitioning: The partitioning code use gitlab_partitions_dynamic and gitlab_partitions_static. Since this is not compatible with context, pod_N` approach all partitioned tables are for now converted into non-partitioned.
Sidekiq Cron: Only regular Sidekiq Workers are covered. In future each Pod would have its own Sidekiq Cron executor.

Problems to evaluate

Router: Current approach uses a single GitLab, pass a Cookie, and dynamically search_schema_path depending on selected Pod. Ideally router should understand (Workhorse?) or have a logic to route a request to a correct Pod based on information from GitLab Rails
Cross-Pod talking: a. fetch data from another Pod (like Project) b. aggregate data across all Pods c. schedule Sidekiq job in a context of another Pod d. route all requests (Controller, GraphQL and API) requests to correct Pod automatically
Many versions of GitLab: A truly Pod architecture allows to run many different versions of GitLab at the same time, allowing to upgrade some customers less frequently than others, and in thus improving resiliency due to application bugs. In a model of decomposed shared cluster-wide tables this might not be possible, since we would require all nodes to run the same latest version of application if cluster-wide tables were updated.
...

Run it

Configure config/database.yml with schema_search_path:, ideally using a new DB
Run scripts/decomposition/create-pods-database
Run bin/rake -t db:seed_fu to seed development database
(Optionally) Run scripts/decomposition/classify-pods-database to fetch test DB and update gitlab_referenced.yml

# config/database.yml

development:
  main:
    database: gitlabhq_development_pods
    schema_search_path: public,pod_0
  ci:
    database: gitlabhq_development_pods
    database_tasks: false
    schema_search_path: public,pod_0

Edited Nov 02, 2022 by Kamil Trzciński (Back 2025-01-01)

Draft: PoC: Pods Stateless Router Proposal (Iteration 2)