Skip to content

Gracefully shut down Praefect server during tests

James Liu requested to merge jliu/praefect-postgres-flake into master

There seems to be occasional flake in our Praefect tests where we exceed the allowed number of client connections to Postgres. The logs show that we're attempting to start a Praefect server (which makes a client connection to Postgres) and then failing due to the number of connections:

    logger.go:127: Recorded logs of "praefect":
        time="2025-01-07T05:17:45Z" level=info msg="{\"level\":\"info\",\"msg\":\"Starting Praefect\",\"pid\":95566,\"time\":\"2025-01-07T05:17:45.680Z\",\"version\":\"17.5.0-rc1\"}" component=praefect
        time="2025-01-07T05:17:45Z" level=info msg="{\"database_address\":\"postgres:5432\",\"level\":\"info\",\"msg\":\"establishing database connection\",\"pid\":95566,\"time\":\"2025-01-07T05:17:45.681Z\"}" component=praefect
        time="2025-01-07T05:17:45Z" level=info msg="{\"error\":\"send ping: failed to connect to `user=postgres database=praefect_4d1a6f94179b4c55ba25c0563675bd8e`: 172.18.0.2:5432 (postgres): server error: FATAL: sorry, too many clients already (SQLSTATE 53300)\",\"level\":\"error\",\"msg\":\"SQL connection open failed\",\"pid\":95566,\"time\":\"2025-01-07T05:17:45.693Z\"}" component=praefect
        time="2025-01-07T05:17:45Z" level=info msg="{\"error\":\"send ping: failed to connect to `user=postgres database=praefect_4d1a6f94179b4c55ba25c0563675bd8e`: 172.18.0.2:5432 (postgres): server error: FATAL: sorry, too many clients already (SQLSTATE 53300)\",\"level\":\"error\",\"msg\":\"Praefect shutdown\",\"pid\":95566,\"time\":\"2025-01-07T05:17:45.693Z\"}" component=praefect

The Shutdown() handler configured in the Praefect test server is called in a tb.Cleanup here. In theory, this should mean that we're cleaning up Praefect servers after each test. The implementation however is calling os.Process.Kill which terminates the process immediately. We should probably be sending a SIGTERM instead to allow Praefect to shut down gracefully and close the DB connection.

Since the test is flaky, it's hard to know if this is the solution, but there's also no harm in making this change.

Edited by James Liu

Merge request reports

Loading