Fix percentage of time rollouts for routing tables switch
What does this MR do and why?
When building a query for the routing table switch, we check the feature flag multiple times and this doesn't work well with percentage of time rollouts, leading to queries with mixed table names:
SELECT "ci_builds_metadata".* FROM "ci_builds_metadata" WHERE "p_ci_builds_metadata"."build_id" = $1 LIMIT $2
-- or
SELECT "p_ci_builds_metadata".* FROM "p_ci_builds_metadata" WHERE "ci_builds_metadata"."build_id" = $1 LIMIT $2
SELECT "ci_builds_metadata".* FROM "ci_builds_metadata" WHERE "p_ci_builds_metadata"."build_id" IN ($1, $2)
-- or
SELECT "p_ci_builds_metadata".* FROM "p_ci_builds_metadata" WHERE "ci_builds_metadata"."build_id" IN ($1, $2)
This fix caches the value for the flag check for the duration of the request and returns the same value, ensuring that we use the same table name in the query.
100%
enables were also not safe because the L1 process cache could expire during a request and the next check would return a different value.
How to set up and validate locally
- Enable the flag for 10% of time:
Feature.enable_percentage_of_time :ci_partitioning_use_ci_builds_metadata_routing_table, 10
- Create a project with a bunch of jobs on each pipeline
test:
image: busybox:latest
variables:
GIT_STRATEGY: none
script:
- echo "Do your test here"
parallel: 25
- On master, some jobs will fail with structural integrity errors when assigned to a runner and for those that are executed the log page sometimes returns 500 errors.
- On this branch it works as expected.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #377534 (closed)
Edited by Marius Bobin