Deprecate `local` and `sql` electors
Praefect's new per_repository
election strategy elects a primary for each repository. With the new election strategy in place, we should deprecate the old strategies in %13.12 so we can begin removing them from %14.0 onwards.
The local
and sql
elect a primary for a virtual storage. This difference required us to add branching logic in quite a few places in the code base to handle the primaries correctly. This increases the complexity and makes maintenance more difficult. The NodeManager
component used in both strategies had grown quite monolithic and began to make things quite difficult to extend and test. As such, the health checking, primary election and connection management functionalities were split in the process. Some features, such as variable replication factor, only support with repository specific primaries as otherwise all repositories would end up on the same host.
The local
elector itself is not supposed to be used in production, as it elects a primary in-memory. There's no coordination between different Praefect nodes. It is used by a wide array of tests though. The repository state tracking is only implemented on top of Postgres, leading to tests using local
elector not really exercising the production code.
The sql
elector also works on the assumption that we may not have database records of all repositories. This makes it difficult to enforce correctness in some cases as we can't error out when a repository does not have database records. Variable replication factor also can't work without the database records. If the repository can't be expected to be present on every node, Praefect really needs to know where it is. There's also some existing open issues that are blocked on being able to expect the records to exist: #3183 (closed).
To summarize:
- We should remove the
local
elector as it doesn't support full feature set of Praefect and it shouldn't be used in production. We should update the tests using it to use theper_repository
elector to make sure the tests actually test the production setup. -
sql
should be removed and everyone migrated toper_repository
elector. Having the primary per virtual storage makes some of the features like variable replication factor non-sensical. Missing database records for repositories makes it difficult to ensure correctness.
We should deprecate them in %13.12, so we can begin removal from %14.0 onwards. This requires us to add a deprecation warning in Omnibus and document the pre-requisites for migrating from sql
to per_repository
elector. The only pre-requisite is that the automatic migration has been run, which was included in %13.6.
As there's a lot of existing use in tests, actually removing the electors may take some time. It would be sufficient to always use per_repository
strategy from %14.0 onwards unless one of the environment variables is set that GitLab Rails uses for testing Gitaly. That way the existing tests keep working until we have a chance to remove the implementations fully but all of the installations would use repository specific primaries.
/cc @mjwood