Fix random number generator for ServiceDiscovery::Sampler
What does this MR do and why?
We noticed when rolling this out on production that all Rails servers were all getting the same selected addressses gitlab-com/gl-infra/production#8036 (comment 1171858328) . We rolled this back on production for now until we fix it.
After much head scratching and driving myself insane testing this
locally I finally came up the realization that rand
always generates a
float smaller than 1 and Random.new(seed)
always gives the same values
if seed is a float less than 1. This is presumably because it's
expecting an integer. As such I've switched to generating the seed as an
integer. I also decided to use Random.new_seed
as this seems to be the
more explicit way of getting a seed value even though this is just some
very large integer.
We also can see this behaviour in staging as we also rolled this out to staging last week:
This shows that we're disproportionately choosing some pgbouncers over others (some have 0 connections). Since we didn't roll back staging we can merge this change and when it's deployed to staging we should hopefully see these lines converge and be evenly distributed. Once we see that we should be OK to try rolling it out to production again.
Also to be extra robust I did verify that Random.new_seed
does seem to give a new number every time the Ruby process starts up and not just start from the same number every time. Even on a fresh docker container. I wanted to validate this because my first thought was that the problem was something to do with Kubernetes pods generating the same random number every time on startup:
$ docker run --rm ruby ruby -e 'puts "Random number: #{Random.new_seed}"'
Random number: 176004222810440936052995754997845339307
$ docker run --rm ruby ruby -e 'puts "Random number: #{Random.new_seed}"'
Random number: 158076779882009195494252748041964262917
$ docker run --rm ruby ruby -e 'puts "Random number: #{Random.new_seed}"'
Random number: 121861738313815470270957465324479091575
$ docker run --rm ruby ruby -e 'puts "Random number: #{Random.new_seed}"'
Random number: 187922300241423054260652959166859157043
Screenshots or screen recordings
irb(main):001:0> rand()
=> 0.9877912783753516
irb(main):002:0> rand()
=> 0.24387352687578734
irb(main):005:0> [1,2,3,4].shuffle(random: Random.new(rand()))
=> [3, 4, 2, 1]
irb(main):006:0> [1,2,3,4].shuffle(random: Random.new(rand()))
=> [3, 4, 2, 1]
irb(main):007:0> [1,2,3,4].shuffle(random: Random.new(rand()))
=> [3, 4, 2, 1]
irb(main):008:0> Random.new_seed
=> 250491254891584330886895437192910398408
irb(main):009:0> Random.new_seed
=> 223472397698967265728569619421196790104
irb(main):010:0> [1,2,3,4].shuffle(random: Random.new(Random.new_seed))
=> [1, 4, 3, 2]
irb(main):011:0> [1,2,3,4].shuffle(random: Random.new(Random.new_seed))
=> [1, 2, 4, 3]
irb(main):012:0> [1,2,3,4].shuffle(random: Random.new(Random.new_seed))
=> [2, 1, 3, 4]
How to set up and validate locally
You can use the same instructions from !101994 (merged) .
Before
You can see that all the rails processes (and GDK has a few) seem to choose ports 6432
and 6433
. Using the pgbouncer console you'll see a bunch of connections for 6432 and 6433 and none for the other pgbouncers. Those only show the client from the psql command.
PgBouncer show clients
$ PGPASSWORD=gitlab psql -U $(whoami) -h localhost -p 6432 -d pgbouncer -c 'show clients'
type | user | database | state | addr | port | local_addr | local_port | connect_time | request_time | wait | wait_us | close_needed | ptr | link | remote_pid | tls
------+-------+-------------------------+--------+-----------+-------+------------+------------+--------------------------+--------------------------+------+---------+--------------+-------------+-------------+------------+-----
C | dylan | gitlabhq_development | active | 127.0.0.1 | 50854 | 127.0.0.1 | 6432 | 2022-11-15 15:10:39 AEDT | 2022-11-15 15:10:39 AEDT | 0 | 0 | 0 | 0x134008210 | 0x144815410 | 0 |
C | dylan | gitlabhq_development | active | 127.0.0.1 | 50865 | 127.0.0.1 | 6432 | 2022-11-15 15:10:40 AEDT | 2022-11-15 15:10:40 AEDT | 0 | 0 | 0 | 0x134008440 | 0x144815640 | 0 |
C | dylan | gitlabhq_development | active | 127.0.0.1 | 50894 | 127.0.0.1 | 6432 | 2022-11-15 15:10:43 AEDT | 2022-11-15 15:10:44 AEDT | 0 | 0 | 0 | 0x1340088a0 | 0x144815aa0 | 0 |
C | dylan | gitlabhq_development | active | 127.0.0.1 | 50910 | 127.0.0.1 | 6432 | 2022-11-15 15:10:48 AEDT | 2022-11-15 15:10:48 AEDT | 0 | 0 | 0 | 0x134008d00 | 0x144815cd0 | 0 |
C | dylan | gitlabhq_development | active | 127.0.0.1 | 50915 | 127.0.0.1 | 6432 | 2022-11-15 15:10:49 AEDT | 2022-11-15 15:10:49 AEDT | 0 | 0 | 0 | 0x134009160 | 0x144816360 | 0 |
C | dylan | gitlabhq_development | active | 127.0.0.1 | 50948 | 127.0.0.1 | 6432 | 2022-11-15 15:11:03 AEDT | 2022-11-15 15:11:04 AEDT | 0 | 0 | 0 | 0x1340095c0 | 0x1448167c0 | 0 |
C | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 50866 | 127.0.0.1 | 6432 | 2022-11-15 15:10:40 AEDT | 2022-11-15 15:10:41 AEDT | 0 | 0 | 0 | 0x134008670 | 0x144815870 | 0 |
C | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 50897 | 127.0.0.1 | 6432 | 2022-11-15 15:10:44 AEDT | 2022-11-15 15:10:44 AEDT | 0 | 0 | 0 | 0x134008ad0 | 0x144815f00 | 0 |
C | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 50912 | 127.0.0.1 | 6432 | 2022-11-15 15:10:48 AEDT | 2022-11-15 15:10:48 AEDT | 0 | 0 | 0 | 0x134008f30 | 0x144816130 | 0 |
C | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 50918 | 127.0.0.1 | 6432 | 2022-11-15 15:10:49 AEDT | 2022-11-15 15:10:49 AEDT | 0 | 0 | 0 | 0x134009390 | 0x144816590 | 0 |
C | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 50949 | 127.0.0.1 | 6432 | 2022-11-15 15:11:03 AEDT | 2022-11-15 15:11:03 AEDT | 0 | 0 | 0 | 0x1340097f0 | 0x1448169f0 | 0 |
C | dylan | pgbouncer | active | 127.0.0.1 | 51012 | 127.0.0.1 | 6432 | 2022-11-15 15:11:32 AEDT | 2022-11-15 15:11:32 AEDT | 0 | 0 | 0 | 0x134009a20 | | 0 |
(12 rows)
$ PGPASSWORD=gitlab psql -U $(whoami) -h localhost -p 6433 -d pgbouncer -c 'show clients'
type | user | database | state | addr | port | local_addr | local_port | connect_time | request_time | wait | wait_us | close_needed | ptr | link | remote_pid | tls
------+-------+-------------------------+--------+-----------+-------+------------+------------+--------------------------+--------------------------+------+---------+--------------+-------------+-------------+------------+-----
C | dylan | gitlabhq_development | active | 127.0.0.1 | 50896 | 127.0.0.1 | 6433 | 2022-11-15 15:10:44 AEDT | 2022-11-15 15:10:44 AEDT | 0 | 0 | 0 | 0x14780e040 | 0x147814c10 | 0 |
C | dylan | gitlabhq_development | active | 127.0.0.1 | 50909 | 127.0.0.1 | 6433 | 2022-11-15 15:10:48 AEDT | 2022-11-15 15:10:48 AEDT | 0 | 0 | 0 | 0x14780e4a0 | 0x1478152a0 | 0 |
C | dylan | gitlabhq_development | active | 127.0.0.1 | 50916 | 127.0.0.1 | 6433 | 2022-11-15 15:10:49 AEDT | 2022-11-15 15:10:49 AEDT | 0 | 0 | 0 | 0x14780e900 | 0x147815700 | 0 |
C | dylan | gitlabhq_development | active | 127.0.0.1 | 50951 | 127.0.0.1 | 6433 | 2022-11-15 15:11:03 AEDT | 2022-11-15 15:11:04 AEDT | 0 | 0 | 0 | 0x14780ed60 | 0x147815b60 | 0 |
C | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 50855 | 127.0.0.1 | 6433 | 2022-11-15 15:10:39 AEDT | 2022-11-15 15:10:40 AEDT | 0 | 0 | 0 | 0x14780de10 | 0x147814e40 | 0 |
C | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 50898 | 127.0.0.1 | 6433 | 2022-11-15 15:10:44 AEDT | 2022-11-15 15:10:44 AEDT | 0 | 0 | 0 | 0x14780e270 | 0x147815070 | 0 |
C | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 50911 | 127.0.0.1 | 6433 | 2022-11-15 15:10:48 AEDT | 2022-11-15 15:10:48 AEDT | 0 | 0 | 0 | 0x14780e6d0 | 0x1478154d0 | 0 |
C | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 50917 | 127.0.0.1 | 6433 | 2022-11-15 15:10:49 AEDT | 2022-11-15 15:10:49 AEDT | 0 | 0 | 0 | 0x14780eb30 | 0x147815930 | 0 |
C | dylan | pgbouncer | active | 127.0.0.1 | 51018 | 127.0.0.1 | 6433 | 2022-11-15 15:11:34 AEDT | 2022-11-15 15:11:34 AEDT | 0 | 0 | 0 | 0x14780ef90 | | 0 |
(9 rows)
$ PGPASSWORD=gitlab psql -U $(whoami) -h localhost -p 6434 -d pgbouncer -c 'show clients'
type | user | database | state | addr | port | local_addr | local_port | connect_time | request_time | wait | wait_us | close_needed | ptr | link | remote_pid | tls
------+-------+-----------+--------+-----------+-------+------------+------------+--------------------------+--------------------------+------+---------+--------------+-------------+------+------------+-----
C | dylan | pgbouncer | active | 127.0.0.1 | 51036 | 127.0.0.1 | 6434 | 2022-11-15 15:11:42 AEDT | 2022-11-15 15:11:42 AEDT | 0 | 0 | 0 | 0x138009810 | | 0 |
(1 row)
$ PGPASSWORD=gitlab psql -U $(whoami) -h localhost -p 6435 -d pgbouncer -c 'show clients'
type | user | database | state | addr | port | local_addr | local_port | connect_time | request_time | wait | wait_us | close_needed | ptr | link | remote_pid | tls
------+-------+-----------+--------+-----------+-------+------------+------------+--------------------------+--------------------------+------+---------+--------------+-------------+------+------------+-----
C | dylan | pgbouncer | active | 127.0.0.1 | 51049 | 127.0.0.1 | 6435 | 2022-11-15 15:11:46 AEDT | 2022-11-15 15:11:46 AEDT | 0 | 0 | 0 | 0x160808210 | | 0 |
After
Now there seems to be a random selection of connections across all pgbouncers:
PgBouncer show clients
$ PGPASSWORD=gitlab psql -U $(whoami) -h localhost -p 6432 -d pgbouncer -c 'show clients'
type | user | database | state | addr | port | local_addr | local_port | connect_time | request_time | wait | wait_us | close_needed | ptr | link | remote_pid | tls
------+-------+-------------------------+--------+-----------+-------+------------+------------+--------------------------+--------------------------+------+---------+--------------+-------------+-------------+------------+-----
C | dylan | gitlabhq_development | active | 127.0.0.1 | 51535 | 127.0.0.1 | 6432 | 2022-11-15 15:15:03 AEDT | 2022-11-15 15:15:03 AEDT | 0 | 0 | 0 | 0x15d008440 | 0x15c011410 | 0 |
C | dylan | gitlabhq_development | active | 127.0.0.1 | 51544 | 127.0.0.1 | 6432 | 2022-11-15 15:15:04 AEDT | 2022-11-15 15:15:04 AEDT | 0 | 0 | 0 | 0x15d008670 | 0x15c011870 | 0 |
C | dylan | gitlabhq_development | active | 127.0.0.1 | 51564 | 127.0.0.1 | 6432 | 2022-11-15 15:15:13 AEDT | 2022-11-15 15:15:13 AEDT | 0 | 0 | 0 | 0x15d0088a0 | 0x15c011aa0 | 0 |
C | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 51533 | 127.0.0.1 | 6432 | 2022-11-15 15:15:02 AEDT | 2022-11-15 15:15:02 AEDT | 0 | 0 | 0 | 0x15d008210 | 0x15c011640 | 0 |
C | dylan | pgbouncer | active | 127.0.0.1 | 51581 | 127.0.0.1 | 6432 | 2022-11-15 15:15:19 AEDT | 2022-11-15 15:15:19 AEDT | 0 | 0 | 0 | 0x15d008ad0 | | 0 |
(5 rows)
$ PGPASSWORD=gitlab psql -U $(whoami) -h localhost -p 6433 -d pgbouncer -c 'show clients'
type | user | database | state | addr | port | local_addr | local_port | connect_time | request_time | wait | wait_us | close_needed | ptr | link | remote_pid | tls
------+-------+-------------------------+--------+-----------+-------+------------+------------+--------------------------+--------------------------+------+---------+--------------+-------------+-------------+------------+-----
C | dylan | gitlabhq_development | active | 127.0.0.1 | 51485 | 127.0.0.1 | 6433 | 2022-11-15 15:14:56 AEDT | 2022-11-15 15:14:56 AEDT | 0 | 0 | 0 | 0x12800b810 | 0x13900c810 | 0 |
C | dylan | gitlabhq_development | active | 127.0.0.1 | 51498 | 127.0.0.1 | 6433 | 2022-11-15 15:14:57 AEDT | 2022-11-15 15:14:57 AEDT | 0 | 0 | 0 | 0x12800ba40 | 0x13900ca40 | 0 |
C | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 51537 | 127.0.0.1 | 6433 | 2022-11-15 15:15:03 AEDT | 2022-11-15 15:15:03 AEDT | 0 | 0 | 0 | 0x12800bc70 | 0x13900cea0 | 0 |
C | dylan | pgbouncer | active | 127.0.0.1 | 51590 | 127.0.0.1 | 6433 | 2022-11-15 15:15:23 AEDT | 2022-11-15 15:15:23 AEDT | 0 | 0 | 0 | 0x12800bea0 | | 0 |
(4 rows)
$ PGPASSWORD=gitlab psql -U $(whoami) -h localhost -p 6434 -d pgbouncer -c 'show clients'
type | user | database | state | addr | port | local_addr | local_port | connect_time | request_time | wait | wait_us | close_needed | ptr | link | remote_pid | tls
------+-------+-------------------------+--------+-----------+-------+------------+------------+--------------------------+--------------------------+------+---------+--------------+-------------+-------------+------------+-----
C | dylan | gitlabhq_development | active | 127.0.0.1 | 51531 | 127.0.0.1 | 6434 | 2022-11-15 15:15:02 AEDT | 2022-11-15 15:15:02 AEDT | 0 | 0 | 0 | 0x12d00bc70 | 0x13d80de10 | 0 |
C | dylan | gitlabhq_development | active | 127.0.0.1 | 51536 | 127.0.0.1 | 6434 | 2022-11-15 15:15:03 AEDT | 2022-11-15 15:15:03 AEDT | 0 | 0 | 0 | 0x12d00c0d0 | 0x13d80e6d0 | 0 |
C | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 51486 | 127.0.0.1 | 6434 | 2022-11-15 15:14:56 AEDT | 2022-11-15 15:14:56 AEDT | 0 | 0 | 0 | 0x12d00b810 | 0x13d80e040 | 0 |
C | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 51499 | 127.0.0.1 | 6434 | 2022-11-15 15:14:57 AEDT | 2022-11-15 15:14:57 AEDT | 0 | 0 | 0 | 0x12d00ba40 | 0x13d80e270 | 0 |
C | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 51534 | 127.0.0.1 | 6434 | 2022-11-15 15:15:02 AEDT | 2022-11-15 15:15:02 AEDT | 0 | 0 | 0 | 0x12d00bea0 | 0x13d80e4a0 | 0 |
C | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 51546 | 127.0.0.1 | 6434 | 2022-11-15 15:15:04 AEDT | 2022-11-15 15:15:04 AEDT | 0 | 0 | 0 | 0x12d00c300 | 0x13d80eb30 | 0 |
C | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 51565 | 127.0.0.1 | 6434 | 2022-11-15 15:15:13 AEDT | 2022-11-15 15:15:13 AEDT | 0 | 0 | 0 | 0x12d00c530 | 0x13d80ed60 | 0 |
C | dylan | pgbouncer | active | 127.0.0.1 | 51598 | 127.0.0.1 | 6434 | 2022-11-15 15:15:26 AEDT | 2022-11-15 15:15:26 AEDT | 0 | 0 | 0 | 0x12d00c760 | | 0 |
(8 rows)
$ PGPASSWORD=gitlab psql -U $(whoami) -h localhost -p 6435 -d pgbouncer -c 'show clients'
type | user | database | state | addr | port | local_addr | local_port | connect_time | request_time | wait | wait_us | close_needed | ptr | link | remote_pid | tls
------+-------+-------------------------+--------+-----------+-------+------------+------------+--------------------------+--------------------------+------+---------+--------------+-------------+-------------+------------+-----
C | dylan | gitlabhq_development | active | 127.0.0.1 | 51532 | 127.0.0.1 | 6435 | 2022-11-15 15:15:02 AEDT | 2022-11-15 15:15:02 AEDT | 0 | 0 | 0 | 0x12f00d010 | 0x11d00b810 | 0 |
C | dylan | gitlabhq_development | active | 127.0.0.1 | 51545 | 127.0.0.1 | 6435 | 2022-11-15 15:15:04 AEDT | 2022-11-15 15:15:04 AEDT | 0 | 0 | 0 | 0x12f00d470 | 0x11d00bc70 | 0 |
C | dylan | gitlabhq_development | active | 127.0.0.1 | 51566 | 127.0.0.1 | 6435 | 2022-11-15 15:15:13 AEDT | 2022-11-15 15:15:14 AEDT | 0 | 0 | 0 | 0x12f00d8d0 | 0x11d00c0d0 | 0 |
C | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 51538 | 127.0.0.1 | 6435 | 2022-11-15 15:15:03 AEDT | 2022-11-15 15:15:03 AEDT | 0 | 0 | 0 | 0x12f00d240 | 0x11d00ba40 | 0 |
C | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 51547 | 127.0.0.1 | 6435 | 2022-11-15 15:15:04 AEDT | 2022-11-15 15:15:04 AEDT | 0 | 0 | 0 | 0x12f00d6a0 | 0x11d00bea0 | 0 |
C | dylan | pgbouncer | active | 127.0.0.1 | 51603 | 127.0.0.1 | 6435 | 2022-11-15 15:15:29 AEDT | 2022-11-15 15:15:29 AEDT | 0 | 0 | 0 | 0x12f00db00 | | 0 |
(6 rows)
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.