Fix race blocking goroutine in shell executor
What does this MR do?
Fix race blocking goroutine in shell executor
Why was this MR needed?
A goroutine writes to waitCh
and a select
block reads from the
channel. The select
block is also reading from the context.Done. It
might be the case that the waitCh
write will block forever because
ctx.Done
returns and no one is reading from waitCh
.
This is discussed in detail in https://songlh.github.io/paper/gcatch.pdf
1. Introduction:
A previously unknown concurrency bug in Docker is shown in Figure 1. Function Exec() creates a child goroutine at line 5 to duplicate the content of a.Reader. After the duplication, the child goroutine sends err to the parent goroutine through channel outDone to notify the parent about completion and any possible error (line 7). Since outDone is an unbuffered channel (line 3), the child blocks at line 7 until the parent receives from outDone. Meanwhile, the parent blocks at the select at line 9 until it either receives err from the child (line 10) or receives a message from ctx.Done() (line 13), indicating the entire task can be halted. If the message from ctx.Done() arrives earlier, or if the two messages arrive concurrently and Go’s runtime non-deterministically chooses the second case to execute, the parent will return from function Exec(). No other goroutine can pull messages from outDone, leaving the child goroutine permanently blocked at line 7.
What's the best way to test this MR?
N/A since this is mostly a race condition.