raft: Fix a flaky test in TestRequester_SyncWrite (!7107) · Merge requests · GitLab.org / gitaly

Quang-Minh Nguyen requested to merge qmnguyen0711/fix-raft-flaky-test into master Jul 18, 2024

This MR fixes a race in internal/gitaly/storage/raft TestRequester_SyncWrite/perform_an_operation_successfully_in_a_3-node_cluster test.

=== Failed
=== FAIL: internal/gitaly/storage/raft TestRequester_SyncWrite/perform_an_operation_successfully_in_a_3-node_cluster (4.24s)
    requester_test.go:80: 
        	Error Trace:	/builds/gitlab-org/gitaly/internal/gitaly/storage/raft/requester_test.go:80
        	Error:      	Not equal: 
        	            	expected: 1
        	            	actual  : 0
        	Test:       	TestRequester_SyncWrite/perform_an_operation_successfully_in_a_3-node_cluster
=== FAIL: internal/gitaly/storage/raft TestRequester_SyncWrite (0.04s)

One of the tests creates a test cluster with a mocking updater. The test issues some requests to the cluster and asserts the recorded entries generated by the updater. It uses a wait group inside the updater to ensure the updater is triggered before asserting the results.

Unfortunately, the waitgroup is not enough. After the updater exits, the cluster does some other things before adding the recorded entries to the test. There could be race so that the entries are asserted before the cluster finishes the update operation.

Test triggers requests
Cluster handles updates and calls updater
Updater finishes and releases the waitgroup.
Test asserts the captured entries <-
The cluster update handler exits.

The fix is simple. The cluster has a mutex to synchronize captured entries. It wraps around the adding operation only. This commit moves it out so that it wraps around the whole update handler. As the mutex is used for testing only, it does not affect the distributed nature of the cluster.

raft: Fix a flaky test in TestRequester_SyncWrite

Merge request reports