Geo: Multi-arch containers not properly replicating non-primary architectures to secondary Geo nodes, UI shows replication successful
Summary
Multi-architecture images show as replicated in UI, but non-primary architectures are not available from the secondary node when trying to stat or inspect images.
Steps to reproduce
This behavior was reported by a US federal customer in federal ticket 1050 (GitLab internal, US citizenship required), but I have been able to reproduce the described behavior.
Initial setup
- Have Geo instances with container registry replication enabled. I have two instances in GCP, customer has more
- Build and push a multi-architecture image to the primary node (I used
docker buildx
to build a BusyBox image foramd64
andarm64
) - Wait for sync and verification to complete
- Compare GUI output between primary and secondary nodes. Both of my nodes report synchronized, and container size and hashes are shown to be the same between nodes.
Comparison and troubleshooting
- Using
skopeo
, inspect remote images, I observe that my primary node identifies both architectures in the container but the second does not identify any arch at all. Also note that hashes mismatch:
brad@DebianRulez:~$ skopeo inspect --raw docker://geo1.bradsevy.online:5050/root/busybox-multi
{
"mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
"schemaVersion": 2,
"manifests": [
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"digest": "sha256:ac3408ba45f5038129cefd401d3828bca2a32e54dc0bf6ff44056936457bf1c5",
"size": 740,
"platform": {
"architecture": "amd64", <-----------------------------------------------------------
"os": "linux"
}
},
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"digest": "sha256:1e21fbd67772efeb971f0b97be99572219823216b6b1c47a1308fc27e5076335",
"size": 740,
"platform": {
"architecture": "arm64", <-----------------------------------------------------------
"os": "linux"
}
}
]
}
---
brad@DebianRulez:~$ skopeo inspect --raw docker://geo2.bradsevy.online:5050/root/busybox-multi
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"schemaVersion": 2,
"config": {
"mediaType": "application/vnd.docker.container.image.v1+json",
"digest": "sha256:27f909e5658cb519e5175bc681d5c605f01b613503ce8dcf3fe3c1847d37f8c7",
"size": 844
},
"layers": [
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"digest": "sha256:0bc3020d05f1e08b41f1c5d54650a157b1690cde7fedb1fafbc9cda70ee2ec5c",
"size": 50435617
},
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"digest": "sha256:f875f728594f35a040e0e4b122c67fe6b05592c71912a6ad4a3136a907fe3eaa",
"size": 14124231
}
]
}
- File sizes and checksums mismatch:
Primary:
root@brad-geo1:~# du -sh /var/opt/gitlab/gitlab-rails/shared/registry/docker/registry/v2/repositories/root/busybox-multi
148K /var/opt/gitlab/gitlab-rails/shared/registry/docker/registry/v2/repositories/root/busybox-multi
root@brad-geo1:~# find /var/opt/gitlab/gitlab-rails/shared/registry/docker/registry/v2/repositories/root/busybox-multi -type f -exec md5sum {} \; | sort -k 2 | md5sum
ca6f7edcd95ade00548dc258e2b40af1 -
Secondary:
root@brad-geo2:~# du -sh /var/opt/gitlab/gitlab-rails/shared/registry/docker/registry/v2/repositories/root/busybox-multi
92K /var/opt/gitlab/gitlab-rails/shared/registry/docker/registry/v2/repositories/root/busybox-multi
root@brad-geo2:~# find /var/opt/gitlab/gitlab-rails/shared/registry/docker/registry/v2/repositories/root/busybox-multi -type f -exec md5sum {} \; | sort -k 2 | md5sum
e4d0a98fd71c3a6b7ec3f2b62e3bc64d -
- Specifying architecture to pull with
--platform=arm64
, then inspecting the pulled image withdocker image inspect <id>
results in successfully pulling thearm64
image from the primary node, but still pullingamd64
on the secondary node:
Primary node:
brad@DebianRulez:~$ docker pull geo1.bradsevy.online:5050/root/busybox-multi --platform=arm64
Using default tag: latest
latest: Pulling from root/busybox-multi
310b368da982: Pull complete
dc96c5f90a6f: Pull complete
Digest: sha256:eaf1fdf80669e7338ab1edfeabd8b96f2fac673eaa971f8480d4006e29ec7a72
Status: Downloaded newer image for geo1.bradsevy.online:5050/root/busybox-multi:latest
geo1.bradsevy.online:5050/root/busybox-multi:latest
brad@DebianRulez:~$ docker images -a
REPOSITORY TAG IMAGE ID CREATED SIZE
geo1.bradsevy.online:5050/root/busybox-multi latest 798012f55906 12 days ago 126MB
brad@DebianRulez:~$ docker image inspect 798012f55906
[
{
"Id": "sha256:798012f55906d247c79ea2e9acfbc6d53593b7751c5d851bf1f41eaff4237f52",
"RepoTags": [
"geo1.bradsevy.online:5050/root/busybox-multi:latest"
],
"RepoDigests": [
"geo1.bradsevy.online:5050/root/busybox-multi@sha256:eaf1fdf80669e7338ab1edfeabd8b96f2fac673eaa971f8480d4006e29ec7a72"
],
"Parent": "",
"Comment": "buildkit.dockerfile.v0",
"Created": "2021-06-30T18:46:37.189300648Z",
"Container": "",
"ContainerConfig": {
"Hostname": "",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": null,
"Cmd": null,
"Image": "",
"Volumes": null,
"WorkingDir": "",
"Entrypoint": null,
"OnBuild": null,
"Labels": null
},
"DockerVersion": "",
"Author": "",
"Config": {
"Hostname": "",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
],
"Cmd": [
"bash"
],
"Image": "",
"Volumes": null,
"WorkingDir": "",
"Entrypoint": null,
"OnBuild": null,
"Labels": null
},
"Architecture": "arm64", <------------------------------------------------------------------------------
"Os": "linux",
"Size": 126480503,
"VirtualSize": 126480503,
"GraphDriver": {
"Data": {
"LowerDir": "/var/lib/docker/overlay2/bcfca43187e0079723d6fdc91b17d003bbd2789ebfc1838909a79f30b9aa99ef/diff",
"MergedDir": "/var/lib/docker/overlay2/fe35f24b09d6f2af2fb39807d7f11531b24f5e445edb9452370a5fcfd32e58de/merged",
"UpperDir": "/var/lib/docker/overlay2/fe35f24b09d6f2af2fb39807d7f11531b24f5e445edb9452370a5fcfd32e58de/diff",
"WorkDir": "/var/lib/docker/overlay2/fe35f24b09d6f2af2fb39807d7f11531b24f5e445edb9452370a5fcfd32e58de/work"
},
"Name": "overlay2"
},
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:bee1275ae7ac87065d84e2e06aec6254579ac19d9b84e325cbbe03d46e8730e7",
"sha256:f48735d31fdcbfb2125502fd4530a17b53d373e61bb8683cb6be9a1c8e1edea3"
]
},
"Metadata": {
"LastTagTime": "0001-01-01T00:00:00Z"
}
}
]
Secondary node:
brad@DebianRulez:~$ docker pull geo2.bradsevy.online:5050/root/busybox-multi --platform=arm64
Using default tag: latest
latest: Pulling from root/busybox-multi
0bc3020d05f1: Pull complete
f875f728594f: Pull complete
Digest: sha256:ac3408ba45f5038129cefd401d3828bca2a32e54dc0bf6ff44056936457bf1c5
Status: Downloaded newer image for geo2.bradsevy.online:5050/root/busybox-multi:latest
geo2.bradsevy.online:5050/root/busybox-multi:latest
brad@DebianRulez:~$ docker images -a
REPOSITORY TAG IMAGE ID CREATED SIZE
geo2.bradsevy.online:5050/root/busybox-multi latest 27f909e5658c 12 days ago 133MB
brad@DebianRulez:~$ docker image inspect 27f909e5658c
[
{
"Id": "sha256:27f909e5658cb519e5175bc681d5c605f01b613503ce8dcf3fe3c1847d37f8c7",
"RepoTags": [
"geo2.bradsevy.online:5050/root/busybox-multi:latest"
],
"RepoDigests": [
"geo2.bradsevy.online:5050/root/busybox-multi@sha256:ac3408ba45f5038129cefd401d3828bca2a32e54dc0bf6ff44056936457bf1c5"
],
"Parent": "",
"Comment": "buildkit.dockerfile.v0",
"Created": "2021-06-30T18:46:21.646255956Z",
"Container": "",
"ContainerConfig": {
"Hostname": "",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": null,
"Cmd": null,
"Image": "",
"Volumes": null,
"WorkingDir": "",
"Entrypoint": null,
"OnBuild": null,
"Labels": null
},
"DockerVersion": "",
"Author": "",
"Config": {
"Hostname": "",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
],
"Cmd": [
"bash"
],
"Image": "",
"Volumes": null,
"WorkingDir": "",
"Entrypoint": null,
"OnBuild": null,
"Labels": null
},
"Architecture": "amd64" <----------------------------------------------------------------------------------
"Os": "linux",
"Size": 132681348,
"VirtualSize": 132681348,
"GraphDriver": {
"Data": {
"LowerDir": "/var/lib/docker/overlay2/07930508f682b867663201ea759fc6e2d01ed9283ce0f07e3068397aff530388/diff",
"MergedDir": "/var/lib/docker/overlay2/5b0d1d456fd34ecfbee0096491eb81c0f01f67f4e5564bf23e2b1a5847c036fa/merged",
"UpperDir": "/var/lib/docker/overlay2/5b0d1d456fd34ecfbee0096491eb81c0f01f67f4e5564bf23e2b1a5847c036fa/diff",
"WorkDir": "/var/lib/docker/overlay2/5b0d1d456fd34ecfbee0096491eb81c0f01f67f4e5564bf23e2b1a5847c036fa/work"
},
"Name": "overlay2"
},
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:4e006334a6fdea37622f72b21eb75fe1484fc4f20ce8b8526187d6f7bd90a6fe",
"sha256:51ea4e37f486d3064055a010939db3384b70e33240ff478cb09cf4d3858ca709"
]
},
"Metadata": {
"LastTagTime": "0001-01-01T00:00:00Z"
}
}
]
Internal discussions
I engaged the Registry and Geo teams in their respective internal Slack channels. Messages available until approximately 10 October 2021. Relevant messages are copied into the internal ticket for posterity.
Registry: https://gitlab.slack.com/archives/CRD4A8HG8/p1625686073105400
Geo: https://gitlab.slack.com/archives/CRD4A8HG8/p1625686073105400
Example Project
What is the current bug behavior?
Only amd64
is made available on secondary node.
What is the expected correct behavior?
Secondary architectures (arm64
in this case) should be available on all secondary nodes.
Relevant logs and/or screenshots
404 errors every few seconds on from gitlab-ctl tail registry
on primary node:
2021-07-12_21:03:13.63310 time="2021-07-12T21:03:13Z" level=warning msg="httpSink{http://geo1.bradsevy.online/api/v4/container_registry_event/events} encountered too many errors, backing off"
2021-07-12_21:03:14.66027 time="2021-07-12T21:03:14Z" level=error msg="retryingsink: error writing events: httpSink{http://geo1.bradsevy.online/api/v4/container_registry_event/events}: response status 404 Not Found unaccepted, retrying"
2021-07-12_21:03:14.66032 time="2021-07-12T21:03:14Z" level=warning msg="httpSink{http://geo1.bradsevy.online/api/v4/container_registry_event/events} encountered too many errors, backing off"
2021-07-12_21:03:15.68155 time="2021-07-12T21:03:15Z" level=error msg="retryingsink: error writing events: httpSink{http://geo1.bradsevy.online/api/v4/container_registry_event/events}: response status 404 Not Found unaccepted, retrying"
2021-07-12_21:03:15.68158 time="2021-07-12T21:03:15Z" level=warning msg="httpSink{http://geo1.bradsevy.online/api/v4/container_registry_event/events} encountered too many errors, backing off"
2021-07-12_21:03:16.70599 time="2021-07-12T21:03:16Z" level=error msg="retryingsink: error writing events: httpSink{http://geo1.bradsevy.online/api/v4/container_registry_event/events}: response status 404 Not Found unaccepted, retrying"
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)