Skip to content

raft: Fix a flaky test in TestRequester_SyncWrite

Quang-Minh Nguyen requested to merge qmnguyen0711/fix-raft-flaky-test into master

For #6168 (closed)

This MR fixes a race in internal/gitaly/storage/raft TestRequester_SyncWrite/perform_an_operation_successfully_in_a_3-node_cluster test.

=== Failed
=== FAIL: internal/gitaly/storage/raft TestRequester_SyncWrite/perform_an_operation_successfully_in_a_3-node_cluster (4.24s)
    requester_test.go:80: 
        	Error Trace:	/builds/gitlab-org/gitaly/internal/gitaly/storage/raft/requester_test.go:80
        	Error:      	Not equal: 
        	            	expected: 1
        	            	actual  : 0
        	Test:       	TestRequester_SyncWrite/perform_an_operation_successfully_in_a_3-node_cluster
=== FAIL: internal/gitaly/storage/raft TestRequester_SyncWrite (0.04s)

One of the tests creates a test cluster with a mocking updater. The test issues some requests to the cluster and asserts the recorded entries generated by the updater. It uses a wait group inside the updater to ensure the updater is triggered before asserting the results.

Unfortunately, the waitgroup is not enough. After the updater exits, the cluster does some other things before adding the recorded entries to the test. There could be race so that the entries are asserted before the cluster finishes the update operation.

  • Test triggers requests
  • Cluster handles updates and calls updater
  • Updater finishes and releases the waitgroup.
  • Test asserts the captured entries <-
  • The cluster update handler exits.

The fix is simple. The cluster has a mutex to synchronize captured entries. It wraps around the adding operation only. This commit moves it out so that it wraps around the whole update handler. As the mutex is used for testing only, it does not affect the distributed nature of the cluster.

Merge request reports

Loading