Log physical operations instead of logical
Our schema for a write-ahead log entry is growing more complex and keeps growing as we add more functionality. We're currently logging each logical operations which requires us to model the logical operations in the log entry. Each of the logical operations requires separate support for applying the changes.
Applying the logical operations requires the applying node to understand how to apply it. This makes it more difficult to add support for more write types, or change existing ones, as each change requires all nodes in the cluster to understand how to apply it.
Replicating logical operations also leaves more room for inconsistencies. For example, if Git was updated to add a new configuration key to a the configuration file on repository creation, the replicas could diverge if they apply the logical repository creation when running different versions of Git.
We should move to physical replication instead of logical as it is simpler and brings benefits:
- It ensure the replicas have the exact same physical state, not just logically equivalent state. This simplifies troubleshooting.
- Composing log entries from small fundamental operations allows us to support significantly larger group of possible transactions without needing protocol changes. We can simply change the logic on the leader produce different files and operations.
- This doesn't still do away with versioning requirements (#5759) as we can't ship for example newer file formats to older replicas before they know how to read them.
Below is an example of a much simplified log entry schema. All of the existing write types can be represented with it. In addition, it can support a variety of transactions that we currently can't. For example:
- We currently only support a transaction writing into a single repository. The below log entry could be used to write to an arbitrary number of repositories without requiring any changes to the schema. We'd only have to update logic in the transaction manager to properly handle such transactions and produce the log entry. All older nodes would know how to apply the changes since they are just replicating the file system operations.
- Migrations as described in #5758 may do arbitrary file system operations, for example to remove stale files from the repository. Their changes can also be replicated through this without having to add logging and replication support for each migration separately.
Some operations will still benefit from logical replication. These are generally operations that may produce large amounts of data from the existing state of the repository. Basically this is just reference and object repacking, and possibly some migrations. If we repack objects into a single pack, all of the replicas already have it. It's unnecessary to replicate (and backup) the objects again as part of a log entry, so we'd rather log the command to repack and replicate the command.
Transaction verification and conflict checking is the job of the leader, so it's unnecessary to include anything related to it in the log entry. The leader can locally keep the state needed to conflict check transactions as they commit. The leader will verify that all of the logged operations will apply without conflicts. As all replicas have the same physical state, they'll apply everywhere.
syntax = "proto3";
package gitaly;
option go_package = "gitlab.com/gitlab-org/gitaly/v16/proto/go/gitalypb";
// LogEntry is a single entry in a partition's write-ahead log.
message LogEntry {
// Operation models a single operation to be performed.
message Operation {
// CreateHardLink creates a hard link.
message CreateHardLink {
// source_path is the path of the file the hard link should point to.
string source_path = 1;
// destination_path is the path where the hard link should be created at.
string destination_path = 2;
}
// RemoveDirectoryEntry removes a directory or a file
// from the parent directory. When removing a directory,
// it must be empty.
message RemoveDirectoryEntry {
// path is the path of the directory entry to remove.
string path = 1;
}
// CreateDirectory creates a directory at the given path.
message CreateDirectory {
// path is the path where to create the directory.
string path = 1;
// permissions are permissions to set on the created directory.
uint32 permissions = 2;
}
// Flush flushes the entry at the given path to disk.
message Flush {
string path = 1;
}
oneof operation {
CreateHardLink create_hard_link = 1;
RemoveDirectoryEntry remove_directory_entry = 2;
CreateDirectory create_directory = 3;
Flush flush = 4;
};
}
// operations is an ordered list of operations to run in order to apply
// this log entry.
repeated Operation operations = 1;
}