Understanding P-values
I've read the description of the P-value present in the README file, however I am not entirely certain I understood how to interpret it. My confusion comes from the following excerpts:
The number is approximately the improbability of seeing the result from a true RNG
The hash value distribution is compared against a hypothetical true random number generator, so if the P-value represents the improbability of the hash being generated by such an RNG then the lower the improbability the better (as it means that the hashing values are closer to the RNG output). Since the P-value represents a negative power of 2, the higher the P-value, the lower the improbability (so a higher incidence of larger P-values is "better").
For example, if a true RNG would be expected to produce the same or a worse result with a probability of 0.075, then that is about 2^-3.737. The exponent is then rounded towards zero and the sign is simply discarded (since probabilities are never greater than 1, the exponent is always negative), and finally reported as "^ 3".
Here the probability is described as the probability of the RNG producing the same or a worse result. However, from my understanding the RNG is kind of the goal, so we want to produce the same results or better, meaning that a higher probability of the RNG being worse would be better for us. This seems to lean towards interpreting a higher incidence of smaller P-values as "better".
Furthermore, the README also mentions that
This means that, in general, a true RNG would have about twice as many ^4 results as ^5 results, and twice as many ^3 results as ^4 results, and so on.
Which seems to imply that a true RNG would have a higher incidence of smaller P-values, thus making me want to interpret this as "smaller P-values are better". Thus my concrete "issue" is: if the P-values were introduced to address the fact that the metrics often did not take the test parameters into account
then I think some general guidelines in how to interpret these values (other than pointing to a Wikipedia article that itself might cause more confusion than clarifications) would be helpful.
I've ran the tool for a few different hashing implementations to try and get a better grasp at interpreting the results:
MD5 Results
-------------------------------------------------------------------------------
Log2(p-value) summary:
0 1 2 3 4 5 6 7 8 9 10
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
6066 1341 562 349 149 72 34 28 14 5 2
11 12 13 14 15 16 17 18 19 20 21+
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
1 2 0 0 0 0 0 0 0 0 0
-------------------------------------------------------------------------------
SHA-1 Results
-------------------------------------------------------------------------------
Log2(p-value) summary:
0 1 2 3 4 5 6 7 8 9 10
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
6144 1291 623 265 143 73 44 19 9 7 1
11 12 13 14 15 16 17 18 19 20 21+
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
2 4 0 0 0 0 0 0 0 0 0
-------------------------------------------------------------------------------
Composite (merge) Results
-------------------------------------------------------------------------------
Log2(p-value) summary:
0 1 2 3 4 5 6 7 8 9 10
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
6068 1327 654 295 141 67 31 14 12 7 6
11 12 13 14 15 16 17 18 19 20 21+
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
0 2 1 0 0 0 0 0 0 0 0
-------------------------------------------------------------------------------
Composite (append) Results
-------------------------------------------------------------------------------
Log2(p-value) summary:
0 1 2 3 4 5 6 7 8 9 10
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
11290 1349 566 329 146 69 27 27 11 4 5
11 12 13 14 15 16 17 18 19 20 21+
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
2 2 2 2 0 0 0 0 0 0 0
-------------------------------------------------------------------------------
The "composite" algorithms are implementations I've added (i.e. composite merge performs an XOR of the 128-bit hashes produced by MD5 and SHA-1, whilst the composite append appends the hash produced by SHA-1 to the hash produced by MD5 - effectively generating a 256-bit hash). First off, you can see that in the 256-bit hash a lot more results tend to have smaller P-values, and since the 256-bit hash should in theory be better than plain 128-bit ones, it seems to support that smaller P-values are "better". Looking at the 128-bit hash results, it seems that plain MD5 and the Composite implementation have pretty similar results, but if smaller P-values are better it still seems that SHA-1 outperforms both.
Would these conclusions be in the ballpark of how the P-values are meant to be interpreted, or have I completely misunderstood the concept?