chore(datastore): add ability to elect DLB replica based on replication lag
What does this MR do?
Related to DLB: Implement primary sticking (#1306 - closed). This adds a new UpToDateReplica
method to the DB load balancer entity so that a candidate replica is checked for replication lag against a previously recorded primary LSN for a given repository (!1694 (merged)), in conformance with the (spec).
How to test locally
-
Setup your local environment as described in docs/database-local-setup.md#fixed-hosts. Note that I'm updating this doc in this MR with relevant changes.
-
Tail your GDK PostgreSQL logs:
$ gdk tail postgresql*
-
Apply the following patch:
diff --git a/registry/handlers/app.go b/registry/handlers/app.go index caffff638..b9e99854d 100644 --- a/registry/handlers/app.go +++ b/registry/handlers/app.go @@ -10,6 +10,7 @@ import ( "errors" "expvar" "fmt" + "github.com/docker/distribution/registry/datastore/models" "io" "math/rand" "net" @@ -388,13 +389,21 @@ func NewApp(ctx context.Context, config *configuration.Configuration) (*App, err dbOpts = append(dbOpts, datastore.WithFixedHosts(hosts)) } } - + dbOpts = append(dbOpts, datastore.WithLSNCache(datastore.NewCentralRepositoryCache(app.redisCache))) db, err := datastore.NewDBLoadBalancer(ctx, dsn, dbOpts...) if err != nil { return nil, fmt.Errorf("failed to initialize database connections: %w", err) } startDBReplicaChecking(ctx, db) + repo := &models.Repository{Path: "test/repo"} + if err := db.RecordLSN(ctx, repo); err != nil { + panic(err) + } + if err := db.UpToDateReplica(ctx, repo).DB.PingContext(ctx); err != nil { + panic(err) + } + // Skip postdeployment migrations to prevent pending post deployment // migrations from preventing the registry from starting. m := migrations.NewMigrator(db.Primary().DB, migrations.SkipPostDeployment)
-
Compile and start the registry
-
Check the output of the GDK logs. You should see something like this:
2024-07-31_18:54:33.53407 postgresql : 2024-07-31 19:54:33.534 WEST [69684] LOG: statement: SELECT pg_current_wal_insert_lsn() 2024-07-31_18:54:33.63855 postgresql-replica-2 : 2024-07-31 19:54:33.638 WEST [69666] LOG: statement: 2024-07-31_18:54:33.63858 postgresql-replica-2 : WITH replica_lsn AS ( 2024-07-31_18:54:33.63859 postgresql-replica-2 : SELECT pg_last_wal_replay_lsn () AS lsn 2024-07-31_18:54:33.63860 postgresql-replica-2 : ) 2024-07-31_18:54:33.63861 postgresql-replica-2 : SELECT 2024-07-31_18:54:33.63861 postgresql-replica-2 : pg_wal_lsn_diff ( '0/59CE00F0' ::pg_lsn, lsn) <= 0 2024-07-31_18:54:33.63861 postgresql-replica-2 : FROM 2024-07-31_18:54:33.63862 postgresql-replica-2 : replica_lsn 2024-07-31_18:54:33.65163 postgresql-replica-2 : 2024-07-31 19:54:33.651 WEST [69666] LOG: statement: -- ping
-
If you have the
redis-cli
installed, you can also double check the key there:redis-cli -s /<full path to gdk root>/redis/redis.socket redis /<full path to gdk root>/redis/redis.socket> KEYS "registry:*" 1) "registry:db:{repository:test:c3ecf330c6173bf445635647db26f09843444527b55b3a0f5d5223d64045d378}:lsn" redis /<full path to gdk root>/redis/redis.socket> GET "registry:db:{repository:test:c3ecf330c6173bf445635647db26f09843444527b55b3a0f5d5223d64045d378}:lsn" "0/59CE00F0"
Author checklist
-
Feature flags
-
Added feature flag: -
This feature does not require a feature flag
-
-
I added unit tests or they are not required -
I added documentation (or it's not required) -
I followed code review guidelines -
I followed Go Style guidelines -
For database changes including schema migrations: -
Manually run up and down migrations in a postgres.ai production database clone and post a screenshot of the result here. -
If adding new queries, extract a query plan from postgres.ai and post the link here. If changing existing queries, also extract a query plan for the current version for comparison. -
I do not have access to postgres.ai and have made a comment on this MR asking for these to be run on my behalf.
-
-
Do not include code that depends on the schema migrations in the same commit. Split the MR into two or more.
-
-
Ensured this change is safe to deploy to individual stages in the same environment ( cny
->prod
). State-related changes can be troublesome due to having parts of the fleet processing (possibly related) requests in different ways.
Reviewer checklist
-
Ensure the commit and MR tittle are still accurate. -
If the change contains a breaking change, apply the breaking change label. -
If the change is considered high risk, apply the label high-risk-change -
Identify if the change can be rolled back safely. (note: all other reasons for not being able to rollback will be sufficiently captured by major version changes).
If the MR introduces database schema migrations:
-
Ensure the commit and MR tittle start with fix:
,feat:
, orperf:
so that the change appears on the Changelog
If the changes cannot be rolled back follow these steps:
-
If not, apply the label cannot-rollback. -
Add a section to the MR description that includes the following details: -
The reasoning behind why a release containing the presented MR can not be rolled back (e.g. schema migrations or changes to the FS structure) -
Detailed steps to revert/disable a feature introduced by the same change where a migration cannot be rolled back. (note: ideally MRs containing schema migrations should not contain feature changes.) -
Ensure this MR does not add code that depends on these changes that cannot be rolled back.
-
Related to #1306 (closed)