Errors Reproduced by Poor Disk Write Speed
I have been experimenting with running the runner and docker executor through some stress tests under conditions of poor disk performance.
Summary
The setup is essentially as follows:
-
n1-standard
on GCP with Ubuntu 19.04 LTS - Using
41829fbc19d43f110e69e64a93de3e87d4fd4d5a
runner revision compiled locally - In a separate process, overwhelm the disk by constantly writing to it
- Loop through a "happy path" test about creating a docker container
Setup
- Create a new
n1-standard
instance on GCP with Ubuntu 19.04 LTS. Checkout the runner code base at41829fbc19d43f110e69e64a93de3e87d4fd4d5a
and install all dependencies needed to compile it locally - Create a new test that loops through the
TestDockerCommandSuccessRun
test:
diff --git a/executors/docker/docker_command_test.go b/executors/docker/docker_command_test.go
index c4fbaa2f0..14f594037 100644
--- a/executors/docker/docker_command_test.go
+++ b/executors/docker/docker_command_test.go
@@ -22,6 +22,13 @@ import (
"gitlab.com/gitlab-org/gitlab-runner/helpers/featureflags"
)
+func Test100DockerCommands(t *testing.T) {
+ for n := 0; n < 150; n++ {
+ fmt.Println("Run: ", n)
+ TestDockerCommandSuccessRun(t)
+ }
+}
+
func TestDockerCommandSuccessRun(t *testing.T) {
if helpers.SkipIntegrationTests(t, "docker", "info") {
return
- Run a script that constantly writes to the disk. I used:
#!/bin/bash
end=$((SECONDS+18000)) # 5 hours
while [ $SECONDS -lt $end ]; do
dd if=/dev/zero of=testfile bs=1024 count=1024000
rm testfile
done
- Run both the test and the disk writing script. For the tests make sure to adjust the test timeout:
go test -timeout 300m gitlab.com/gitlab-org/gitlab-runner/executors/docker -run "^(Test100DockerCommands)$" -v
Results
Errors encountered so far
-
ERROR: Job failed (system failure): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? (docker.go:787:120s)
https://gitlab.com/gitlab-org/gitlab-runner/-/blob/41829fbc19d43f110e69e64a93de3e87d4fd4d5a/executors/docker/docker.go#L787 -
ERROR: Job failed (system failure): Error response from daemon: Conflict. The container name "/runner--project-0-concurrent-0-cache-c3 3bcaa1fd2c77edfc3893b41966cea8" is already in use by container "6d90c2f97c60483b43a365c25c35bad3862ca2f87b380de7df52b903e3f414b9". You have to remove (or rename) that container to be able to reuse that name. (cache_container.go:97:0s)
https://gitlab.com/gitlab-org/gitlab-runner/-/blob/41829fbc19d43f110e69e64a93de3e87d4fd4d5a/executors/docker/internal/volumes/cache_container.go#L97
Control
None of these errors were seen by running the same test on my local laptop (with an SSD) and no disk writing stress.