# Introduction

This page reviews Melissa O'Neill's PCG generators. It is structured as follows: first, we identify the generators with major statistical flaw. Once those are out of the picture, we examine some claims made by Melissa O'Neill on the remaining PCG generators, and provide simple counterexamples for each claim. In the end, we define a new, simple LCG-based generator that is faster and has the same statistical properties of the only remaining 64-bit PCG generator.

The conclusion is that there is no reason to use PCG generators: for those without statistical flaws, there is always a better LCG-based choice.

# Statistical flaws

## Generator with “ext” in the name

Generator with a large state space are useful because they generate many distinct and uncorrelated
sequences. In theory, all nonoverlapping subsequences should look
random, so if you take `R` and `R`' initialized in a different state and
interleave their output (first output from `R`, then first output from `R`',
then second output from `R`, etc.) the resulting sequence should still look random.

But nothing is really random. So what you do is correlation testing: you
take `R`. Then you take an exact copy of `R`, `R`', and you flip exactly one bit
of the state. Then you interleave as above, and examine the output
(usually using some test suite, you can choose the one you prefer).

In theory the interleaved output should immediately look random, but that's very
difficult to obtain, because it takes time for the flipped bit to
influence all the state. So you have to throw away some output from
`R`, `R`' before they *decorrelate*.
For example, `xoroshiro128+`

decorrelates after
a couple of dozens iterations. The Mersenne Twister (19937 bits of state) after millions of
iterations. CMWC4096, the “Mother-of-all” PRNG by Marsaglia, needs
more than 10,000,000 iterations. That is, you have to discard 10,000,000 values from
`R` and `R`' before you see a random interleaved stream.

LCG generators can only be as large as the largest multiplications you can perform. Melissa O'Neill devised a new type of generators (the ones with “ext” in their name) which, mysteriously, overcome this limit.

So let us try to see how much does it take for an “ext” PCG generator to decorrelate. I wrote a C++ program that does a correlation test for you: it creates two “ext” PCG generators, flips a bit of the “ext” state, and emits the output interleaved. It takes a parameter that specifies the number of iterations to perform before emitting the interleaved stream, so we can measure whether, for example, uncorrelation is faster than with the Mersenne Twister.

This is the result with 1 million discarded outputs:

rng=RNG_stdin64, seed=0x5e3432a3 length= 256 megabytes (2^28 bytes), time= 2.2 seconds Test Name Raw Processed Evaluation BCFN(2+0,13-2,T) R=+5218351 p = 0 FAIL !!!!!!!! BCFN(2+1,13-2,T) R=+3553737 p = 0 FAIL !!!!!!!! BCFN(2+2,13-3,T) R= +4207 p = 3e-1991 FAIL !!!!!!!! BCFN(2+3,13-3,T) R=+966.0 p = 4.9e-457 FAIL !!!!!!! BCFN(2+4,13-3,T) R=+203.5 p = 4.2e-96 FAIL !!!!! BCFN(2+5,13-4,T) R= +30.7 p = 2.2e-13 FAIL DC6-9x1Bytes-1 R=+313150 p = 0 FAIL !!!!!!!! Gap-16:A R=+2924020 p = 0 FAIL !!!!!!!! Gap-16:B R=+11092511 p = 0 FAIL !!!!!!!! FPF-14+6/16:(0,14-0) R=+121.6 p = 3.6e-112 FAIL !!!!! FPF-14+6/16:(1,14-0) R=+125.2 p = 1.8e-115 FAIL !!!!! FPF-14+6/16:(2,14-0) R=+109.4 p = 7.2e-101 FAIL !!!!! FPF-14+6/16:(3,14-0) R= +99.7 p = 7.5e-92 FAIL !!!!! FPF-14+6/16:(4,14-1) R= +72.7 p = 3.4e-64 FAIL !!!! FPF-14+6/16:(5,14-2) R= +55.4 p = 2.9e-48 FAIL !!! FPF-14+6/16:(6,14-2) R= +48.4 p = 4.2e-42 FAIL !!! FPF-14+6/16:(7,14-3) R= +37.3 p = 1.7e-32 FAIL !!! FPF-14+6/16:(8,14-4) R= +22.5 p = 2.5e-18 FAIL ! FPF-14+6/16:(9,14-5) R= +22.2 p = 3.0e-18 FAIL ! FPF-14+6/16:(10,14-5) R= +18.2 p = 6.2e-15 FAIL FPF-14+6/16:(13,14-8) R= +8.6 p = 3.0e-6 unusual FPF-14+6/16:(15,14-9) R= +8.7 p = 8.9e-6 unusual FPF-14+6/16:all R=+255.7 p = 1.7e-239 FAIL !!!!!! FPF-14+6/16:all2 R=+14532 p = 4e-5671 FAIL !!!!!!!! BRank(12):128(4) R= +2501 p~= 3e-1331 FAIL !!!!!!!! BRank(12):256(4) R= +5257 p~= 1e-2796 FAIL !!!!!!!! BRank(12):384(1) R= +3963 p~= 4e-1194 FAIL !!!!!!!! BRank(12):512(2) R= +7614 p~= 4e-2293 FAIL !!!!!!!! BRank(12):768(1) R= +8096 p~= 2e-2438 FAIL !!!!!!!! BRank(12):1K(2) R=+15408 p~= 3e-4639 FAIL !!!!!!!! BRank(12):1536(1) R=+16298 p~= 2e-4907 FAIL !!!!!!!! BRank(12):2K(1) R=+21895 p~= 3e-6592 FAIL !!!!!!!! [Low16/64]BCFN(2+0,13-3,T) R=+16181 p = 2e-7658 FAIL !!!!!!!! [Low16/64]BCFN(2+1,13-3,T) R= +3654 p = 3e-1729 FAIL !!!!!!!! [Low16/64]BCFN(2+2,13-4,T) R=+903.0 p = 1.8e-394 FAIL !!!!!!! [Low16/64]BCFN(2+3,13-4,T) R=+118.4 p = 1.1e-51 FAIL !!!! [Low16/64]BCFN(2+5,13-5,T) R= +13.7 p = 1.5e-5 mildly suspicious [Low16/64]DC6-9x1Bytes-1 R=+965794 p = 0 FAIL !!!!!!!! [Low16/64]Gap-16:A R=+838770 p = 0 FAIL !!!!!!!! [Low16/64]Gap-16:B R=+3576303 p = 0 FAIL !!!!!!!! [Low16/64]FPF-14+6/16:(0,14-0) R=+122.4 p = 7.1e-113 FAIL !!!!! [Low16/64]FPF-14+6/16:(1,14-0) R=+127.5 p = 1.3e-117 FAIL !!!!! [Low16/64]FPF-14+6/16:(2,14-1) R= +85.7 p = 1.0e-75 FAIL !!!! [Low16/64]FPF-14+6/16:(3,14-2) R= +59.7 p = 5.4e-52 FAIL !!!! [Low16/64]FPF-14+6/16:(4,14-2) R= +4010 p = 4e-3507 FAIL !!!!!!!! [Low16/64]FPF-14+6/16:(5,14-3) R= +2834 p = 4e-2484 FAIL !!!!!!!! [Low16/64]FPF-14+6/16:(6,14-4) R= +2010 p = 2e-1642 FAIL !!!!!!!! [Low16/64]FPF-14+6/16:(7,14-5) R= +1428 p = 1e-1183 FAIL !!!!!!!! [Low16/64]FPF-14+6/16:(8,14-5) R= +1714 p = 2e-1421 FAIL !!!!!!!! [Low16/64]FPF-14+6/16:(9,14-6) R= +1216 p = 2.1e-930 FAIL !!!!!!! [Low16/64]FPF-14+6/16:(10,14-7) R=+849.5 p = 6.3e-676 FAIL !!!!!!! [Low16/64]FPF-14+6/16:(11,14-8) R=+584.7 p = 8.6e-421 FAIL !!!!!!! [Low16/64]FPF-14+6/16:(12,14-8) R=+540.8 p = 3.4e-389 FAIL !!!!!!! [Low16/64]FPF-14+6/16:(13,14-9) R=+384.8 p = 1.4e-242 FAIL !!!!!! [Low16/64]FPF-14+6/16:(14,14-10) R=+269.5 p = 6.3e-144 FAIL !!!!! [Low16/64]FPF-14+6/16:(15,14-11) R=+214.6 p = 2.2e-94 FAIL !!!!! [Low16/64]FPF-14+6/16:all R= +2608 p = 1e-2447 FAIL !!!!!!!! [Low16/64]FPF-14+6/16:all2 R=+7572279 p = 0 FAIL !!!!!!!! [Low16/64]FPF-14+6/16:cross R= +75.0 p = 2.8e-66 FAIL !!!! [Low16/64]BRank(12):128(4) R= +2501 p~= 3e-1331 FAIL !!!!!!!! [Low16/64]BRank(12):256(2) R= +3717 p~= 5e-1120 FAIL !!!!!!!! [Low16/64]BRank(12):384(1) R= +3963 p~= 4e-1194 FAIL !!!!!!!! [Low16/64]BRank(12):512(2) R= +7599 p~= 1e-2288 FAIL !!!!!!!! [Low16/64]BRank(12):768(1) R= +8096 p~= 2e-2438 FAIL !!!!!!!! [Low16/64]BRank(12):1K(1) R=+10873 p~= 3e-3274 FAIL !!!!!!!! [Low4/64]BCFN(2+0,13-5,T) R= +4244 p = 1e-1661 FAIL !!!!!!!! [Low4/64]BCFN(2+1,13-5,T) R=+597.8 p = 3.4e-234 FAIL !!!!!! [Low4/64]BCFN(2+2,13-5,T) R= +13.7 p = 1.5e-5 suspicious [Low4/64]BCFN(2+3,13-5,T) R= +61.2 p = 3.7e-24 FAIL !! [Low4/64]BCFN(2+4,13-6,T) R= +41.7 p = 1.0e-14 FAIL [Low4/64]BCFN(2+5,13-6,T) R= +14.0 p = 3.0e-5 unusual [Low4/64]DC6-9x1Bytes-1 R=+115116 p = 0 FAIL !!!!!!!! [Low4/64]Gap-16:A R=+247785 p = 0 FAIL !!!!!!!! [Low4/64]Gap-16:B R=+1522079 p = 0 FAIL !!!!!!!! [Low4/64]FPF-14+6/16:(0,14-1) R=+363333 p = 0 FAIL !!!!!!!! [Low4/64]FPF-14+6/16:(1,14-2) R=+256697 p = 0 FAIL !!!!!!!! [Low4/64]FPF-14+6/16:(2,14-2) R=+148798 p = 0 FAIL !!!!!!!! [Low4/64]FPF-14+6/16:(3,14-3) R=+105460 p = 0 FAIL !!!!!!!! [Low4/64]FPF-14+6/16:(8,14-7) R=+111527 p = 0 FAIL !!!!!!!! [Low4/64]FPF-14+6/16:(9,14-8) R=+78894 p = 0 FAIL !!!!!!!! [Low4/64]FPF-14+6/16:(10,14-8) R=+39682 p = 0 FAIL !!!!!!!! [Low4/64]FPF-14+6/16:(11,14-9) R=+28140 p = 0 FAIL !!!!!!!! [Low4/64]FPF-14+6/16:all R=+502730 p = 0 FAIL !!!!!!!! [Low4/64]FPF-14+6/16:all2 R=+68561676162 p = 0 FAIL !!!!!!!! [Low4/64]FPF-14+6/16:cross R=+525391 p = 0 FAIL !!!!!!!! [Low4/64]BRank(12):128(4) R= +2501 p~= 3e-1331 FAIL !!!!!!!! [Low4/64]BRank(12):256(2) R= +3717 p~= 5e-1120 FAIL !!!!!!!! [Low4/64]BRank(12):384(1) R= +4028 p~= 1e-1213 FAIL !!!!!!!! [Low4/64]BRank(12):512(2) R= +7599 p~= 1e-2288 FAIL !!!!!!!! [Low4/64]BRank(12):768(1) R= +8096 p~= 2e-2438 FAIL !!!!!!!! [Low1/64]BCFN(2+0,13-6,T) R= +76.9 p = 8.8e-27 FAIL !! [Low1/64]BCFN(2+1,13-6,T) R=+375.0 p = 7.6e-129 FAIL !!!!! [Low1/64]BCFN(2+2,13-6,T) R=+167.7 p = 7.4e-58 FAIL !!!! [Low1/64]BCFN(2+3,13-6,T) R= +42.5 p = 5.5e-15 FAIL ! [Low1/64]BCFN(2+4,13-7,T) R= +16.2 p = 1.6e-5 mildly suspicious [Low1/64]DC6-9x1Bytes-1 R=+36425 p = 0 FAIL !!!!!!!! [Low1/64]Gap-16:A R=+88655 p = 0 FAIL !!!!!!!! [Low1/64]Gap-16:B R=+526412 p = 0 FAIL !!!!!!!! [Low1/64]FPF-14+6/16:(0,14-2) R=+123481 p = 0 FAIL !!!!!!!! [Low1/64]FPF-14+6/16:(2,14-4) R=+100738 p = 0 FAIL !!!!!!!! [Low1/64]FPF-14+6/16:(4,14-5) R=+71077 p = 0 FAIL !!!!!!!! [Low1/64]FPF-14+6/16:(6,14-7) R=+55336 p = 0 FAIL !!!!!!!! [Low1/64]FPF-14+6/16:(8,14-8) R=+27843 p = 0 FAIL !!!!!!!! [Low1/64]FPF-14+6/16:(10,14-10) R=+16804 p = 2e-8940 FAIL !!!!!!!! [Low1/64]FPF-14+6/16:(11,14-11) R= +2225 p = 2.0e-970 FAIL !!!!!!! [Low1/64]FPF-14+6/16:(12,14-11) R=+14981 p = 2e-6530 FAIL !!!!!!!! [Low1/64]FPF-14+6/16:all R=+215351 p = 0 FAIL !!!!!!!! [Low1/64]FPF-14+6/16:all2 R=+10216812555 p = 0 FAIL !!!!!!!! [Low1/64]FPF-14+6/16:cross R=+521082 p = 0 FAIL !!!!!!!! [Low1/64]BRank(12):128(2) R= +1769 p~= 1.8e-533 FAIL !!!!!!! [Low1/64]BRank(12):256(2) R= +3717 p~= 5e-1120 FAIL !!!!!!!! [Low1/64]BRank(12):384(1) R= +4006 p~= 5e-1207 FAIL !!!!!!!! [Low1/64]BRank(12):512(1) R= +5362 p~= 2e-1615 FAIL !!!!!!!! ...and 37 test result(s) without anomalies

This does not look good. And with 1 *billion* discarded outputs the result is the same (try yourself). What's happening?

The problem is that “ext” PCG generators *never* decorrelate. *Ever*
(“never” here means “not before the thermodynamic
death of the universe”). There is no state mix.
Nobody has ever thought of designing a PRNG in such a flawed way.

The trick used by Melissa O'Neill to make people believe she could have high-quality, large-state generators
using LCGs consists, in practice, in xoring the output of a small (at most 128 bits of state) PCG generator
with a large array. The array in theory changes, and waiting enough (much beyond the thermodynamical death of the universe)
you will see it pass all states, but in practice the array never changes. This is why we cannot decorrelate: the
output of `R` and `R`' *is essentially the same, no matter how much you look for uncorrelated sequences*.

Said otherwise, the whole sequence of the generator is made by an enormous number of strongly correlated, very short sequences. And this makes the correlation tests fail.

You can see this easily by uncommenting the `printf()`

calls in the code and commenting the `fwrite()`

calls.
After a billion iterations, we obtain

d355d4a2d198b55d d355d4a2d198b55d 943c19906dee85e4 943c19906dee85e4 0d4a2d01a24d80ae 0d4a2d01a24d80ae 12d5e10f7a626cbd 12d5e10f7a626cbd 15aedbb162473964 15aedbb162473964 a9700b058d3d2619 a9700b058d3d2619 6a5ed46c771c73d2 6a5ed46c771c73d2 426ca5c99a4980d1 426ca5c99a4980d1 b793262f0f13b965 b793262f0f13b965 be6724c0f4789316 be6724c0f4789316 2ee214efcc33da12 2ee214efcc33da12 38f221757282c60e 38f221757282c60e e03e6c696146fc81 e03e6c696146fc81 16b1ec780e875744 8f9efedf6709a41e fe8dbdbbf39eddd3 fe8dbdbbf39eddd3 047ff5f2784f6e08 047ff5f2784f6e08 6d91e2bebd70954a 6d91e2bebd70954a

Almost all outputs are duplicate: the two generators are not decorrelating. If you look carefully, you'll find two values that are not duplicate. This is all the decorrelation we get after a billion iterations, and it will not improve (not significantly before the thermodynamical death of the universe).

Of course, the
same test on the very small number of bits of the base generator should work without problems (but read below):
nonetheless, the test fails for
the large majority of bits of state; and the larger the state space, the worst the percentage of failing bits. No
modern generator will fail this test on a majority of bits; in fact, good generators do not fail it on *any* bit.
You just might have to discard a small amount of output values.

Wrap-up: **do not use PCG generators with “ext” in the name**.

## Generators with multiple sequences

Melissa O'Neill claims that a strong point of PCG generators is the
possibility of generating multiple independent streams by changing the
additive constant of the underlying LCG generator. This fact is stated
without any proof, and indeed it is entirely false. You can definitely
generate multiple streams, but they might end up being extremely
correlated. This is known at least since Knuth's description of LCGs in
TAoCP (like, half a century?) because it is possible to derive easily
the sequence for a constant given the sequence for *another* constant.
That is, the sequences are strongly correlated, and the minimal scrambling
performed by PCG generator is absolutely insufficient to hide this correlation.

To check that this actually happens, I put together a small C program which creates two PCG generators with seemingly random initial states

0x7C112EEA363433CFB3AA1BA7C748A9B9 0x83EED115C9CBCC304C55E45838B75647and seemingly random increments

0x3E0897751B1A19E7D9D50DD3E3A454DC 0x41F7688AE4E5E618262AF22C1C5BAB23

Following Melissa O'Neill claims, if we interleave the output of these two generators we should see a random stream. But if you pipe the output of the program into PractRand this is what you will see:

rng=RNG_stdin64, seed=0x66b6c6c9 length= 256 megabytes (2^28 bytes), time= 2.9 seconds Test Name Raw Processed Evaluation BCFN(2+0,13-2,T) R=+401986 p = 0 FAIL !!!!!!!! BCFN(2+1,13-2,T) R=+188.0 p = 1.1e-95 FAIL !!!!! DC6-9x1Bytes-1 R= +7342 p = 1e-3850 FAIL !!!!!!!! [Low16/64]BCFN(2+0,13-3,T) R= +3288 p = 6e-1556 FAIL !!!!!!!! [Low16/64]BCFN(2+1,13-3,T) R=+841.7 p = 3.6e-398 FAIL !!!!!!! [Low16/64]BCFN(2+2,13-4,T) R=+266.1 p = 3.2e-116 FAIL !!!!! [Low16/64]BCFN(2+3,13-4,T) R= +61.8 p = 6.3e-27 FAIL !! [Low16/64]BCFN(2+4,13-5,T) R= +14.4 p = 8.1e-6 mildly suspicious [Low16/64]DC6-9x1Bytes-1 R=+30611 p = 0 FAIL !!!!!!!! [Low16/64]FPF-14+6/16:(4,14-2) R=+162.5 p = 6.8e-142 FAIL !!!!! [Low16/64]FPF-14+6/16:(5,14-3) R= +69.8 p = 6.7e-61 FAIL !!!! [Low16/64]FPF-14+6/16:(6,14-4) R=+112.0 p = 1.9e-91 FAIL !!!!! [Low16/64]FPF-14+6/16:(7,14-5) R= +43.2 p = 1.1e-35 FAIL !!! [Low16/64]FPF-14+6/16:(8,14-5) R=+106.9 p = 1.7e-88 FAIL !!!! [Low16/64]FPF-14+6/16:(9,14-6) R=+118.2 p = 1.6e-90 FAIL !!!!! [Low16/64]FPF-14+6/16:(10,14-7) R= +78.3 p = 3.9e-62 FAIL !!!! [Low16/64]FPF-14+6/16:(11,14-8) R= +82.3 p = 3.0e-59 FAIL !!!! [Low16/64]FPF-14+6/16:(12,14-8) R= +81.6 p = 9.5e-59 FAIL !!!! [Low16/64]FPF-14+6/16:(13,14-9) R= +57.3 p = 2.3e-36 FAIL !!! [Low16/64]FPF-14+6/16:(14,14-10) R= +50.0 p = 3.6e-27 FAIL !! [Low16/64]FPF-14+6/16:(15,14-11) R= +21.8 p = 2.4e-10 very suspicious [Low16/64]FPF-14+6/16:all R=+106.3 p = 3.1e-99 FAIL !!!!! [Low16/64]FPF-14+6/16:all2 R=+17519 p = 3e-6345 FAIL !!!!!!!! [Low16/64]FPF-14+6/16:cross R= +18.0 p = 3.9e-16 FAIL ! [Low4/64]BCFN(2+0,13-5,T) R=+130.0 p = 4.5e-51 FAIL !!!! [Low4/64]BCFN(2+1,13-5,T) R= +40.9 p = 3.4e-16 FAIL ! [Low4/64]BCFN(2+2,13-5,T) R= +9.6 p = 6.3e-4 unusual [Low4/64]DC6-9x1Bytes-1 R= +3254 p = 2e-1883 FAIL !!!!!!!! [Low4/64]Gap-16:A R=+249.7 p = 2.0e-196 FAIL !!!!!! [Low4/64]Gap-16:B R= +1533 p = 1e-1385 FAIL !!!!!!!! [Low4/64]FPF-14+6/16:(0,14-1) R= +3544 p = 1e-3140 FAIL !!!!!!!! [Low4/64]FPF-14+6/16:(1,14-2) R= +2511 p = 1e-2195 FAIL !!!!!!!! [Low4/64]FPF-14+6/16:(2,14-2) R= +1372 p = 1e-1199 FAIL !!!!!!!! [Low4/64]FPF-14+6/16:(3,14-3) R= +1058 p = 5.1e-927 FAIL !!!!!!! [Low4/64]FPF-14+6/16:(4,14-4) R=+618.7 p = 1.8e-505 FAIL !!!!!!! [Low4/64]FPF-14+6/16:(5,14-5) R=+241.6 p = 3.4e-200 FAIL !!!!!! [Low4/64]FPF-14+6/16:(6,14-5) R=+179.2 p = 2.1e-148 FAIL !!!!! [Low4/64]FPF-14+6/16:(7,14-6) R=+159.9 p = 1.9e-122 FAIL !!!!! [Low4/64]FPF-14+6/16:(8,14-7) R=+159.7 p = 6.0e-127 FAIL !!!!! [Low4/64]FPF-14+6/16:(9,14-8) R=+122.3 p = 5.0e-88 FAIL !!!! [Low4/64]FPF-14+6/16:(10,14-8) R= +47.9 p = 1.6e-34 FAIL !!! [Low4/64]FPF-14+6/16:(11,14-9) R= +34.8 p = 3.5e-22 FAIL !! [Low4/64]FPF-14+6/16:(12,14-10) R= +26.3 p = 1.5e-14 FAIL [Low4/64]FPF-14+6/16:all R= +4730 p = 6e-4439 FAIL !!!!!!!! [Low4/64]FPF-14+6/16:all2 R=+5302901 p = 0 FAIL !!!!!!!! [Low4/64]FPF-14+6/16:cross R= +5962 p = 8e-4680 FAIL !!!!!!!! ...and 113 test result(s) without anomalies

In fact, a very large number of different initialization parameters will lead to the same failures. This happens because
modulo an additive constant *there are just two sequences that can be produced by an LCG of the type used by PCG,
no matter which constant you are using*. The idea that different constants generate truly different sequences is
entirely false.

This is known at least since Durst's 1989 paper:
if you have an LCG of the form *x*_{n} = *a* *x*_{n – 1} + *c*, and you take any *r*, then the generator *y*_{n} = *a* *y*_{n – 1} + *c* + (*a* – 1)*r*
satisfies *x*_{n} = *y*_{n} – *r* for all *n*. It is easy to see by induction starting from *x*_{0} = *y*_{0} – *r*, as
*y*_{n} = *a* *y*_{n – 1} + *c* = *a* (*x*_{n – 1} – *r*) + *c* + (*a* – 1)*r* = *a* *x*_{n – 1} + *c* – *r* = *x*_{n} – *r*.

Because of the type of constant used by PCG, called of *high potency* (it's a good property in general), for every pair of constants
*c* and *d* such that *c* – *d* is divisible by four you can find such an *r*. This divides the constants
in two equivalence classes, and the sequences in each class are basically the same—they differ just by an additive constant. This minimal
difference makes the sequences massively correlated, and this correlation passes without difficulty the minimum scrambling of PCG generators; hence the
disaster above.

If you want to see this correlation directly, you can try a program that given a state and two “independent stream” initializers, will start two PCG 128-bit generators using the provided initializers: the first generator will start from the provided state, the second generator from the associated state in the correspondence above. The difference between the states of the two generators will always be the same constant, and the program just prints it as the state of both generators advances:

./pcg128diff 0x2360ed051fc65da4 0x4385df649fccf645 0x5851f42d4c957f2d 0x14057b7ef767814f 0xbf58476d1ce4e5b9 0x94d049bb133111eb 0xf48e95bc9761ba1d9a9628d3d501146e 0xf48e95bc9761ba1d9a9628d3d501146e 0xf48e95bc9761ba1d9a9628d3d501146e 0xf48e95bc9761ba1d9a9628d3d501146e 0xf48e95bc9761ba1d9a9628d3d501146e 0xf48e95bc9761ba1d9a9628d3d501146e 0xf48e95bc9761ba1d9a9628d3d501146e 0xf48e95bc9761ba1d9a9628d3d501146e 0xf48e95bc9761ba1d9a9628d3d501146e 0xf48e95bc9761ba1d9a9628d3d501146e 0xf48e95bc9761ba1d9a9628d3d501146e 0xf48e95bc9761ba1d9a9628d3d501146e 0xf48e95bc9761ba1d9a9628d3d501146e 0xf48e95bc9761ba1d9a9628d3d501146e 0xf48e95bc9761ba1d9a9628d3d501146e 0xf48e95bc9761ba1d9a9628d3d501146e

Two generators following the same states modulo an additive constant cannot generate “independent streams“. If you modify the program to write the output of the two generators interleaved and pipe the result into PractRand, you will see a cascade of statistical failures as shown above.

Wrap-up: **Do not use PCG generators with multiple sequences**.

## Subsequences within the same generator

It is instructive to look inside a single-sequence 128-bit PCG generator and consider the many possible sequences it can emit starting from different initial states, similarly to what we did with “ext” generators. We already mentioned that the sequence emitted by an LCG when using a certain additive constant can be obtained from the sequence for another additive constant, but in fact much more is true: the output for any additive constant and any initial state can be computed easily knowing the output from state 0 using the additive constant 1.

For prime moduli, nothing particularly bad happens. But if the modulus is not prime,
there are consequences, and the worst consequences are for a modulus which is the power of a prime, as in the PCG
case (which is why LCGs with moduli of the form
2^{n}, as the ones used in PCG generators, are
considered of low quality). Essentially, changing
the high bits of the state has no impact on the low bits—forever.
And, as in the previous case, this structural defect is passed on to PCG generators.

I put together a small C program
which creates two 128-bit PCG generators. You provide the initial state of the first
generator, and the highest 64 bits of the state of the second generator; the lowest
64 bits will be identical to the first one. Then the program emits the output of
the two PCG generators, interleaved. Let us try with random arguments
`0x596d84dfefec2fc7`

, `0x6b79f81ab9f3e37b`

and `0x8d7deae980a64ab0`

(i.e., the first PCG generator will start from state `0x596d84dfefec2fc76b79f81ab9f3e37b`

and the second generator from state `0x8d7deae980a64ab06b79f81ab9f3e37b`

) and pipe into PractRand:

rng=RNG_stdin64, seed=unknown length= 16 gigabytes (2^34 bytes), time= 411 seconds Test Name Raw Processed Evaluation BCFN(0+0,13-0,T) R= +22.5 p = 1.3e-11 VERY SUSPICIOUS BCFN(0+1,13-0,T) R=+374.9 p = 5.5e-200 FAIL !!!!!! BCFN(0+2,13-0,T) R=+296.4 p = 5.2e-158 FAIL !!!!! DC6-5x4Bytes-1 R= +85.3 p = 1.2e-51 FAIL !!!! ...and 1637 test result(s) without anomalies

In spite of half the bits of the initial state being different, and in spite of having waited billion outputs for the two sequences emitted by the PCG generator to decorrelate, decorrelation does not happen: for every sequence emitted by a PCG generator, there is a very large number of non-overlapping correlated sequences starting from different initial states.

If we change just the highest 32 bits insted of the highest 64 bits the results are so catastrophic to be embarrasing (I removed hundreds of lines):

rng=RNG_stdin64, seed=unknown length= 64 megabytes (2^26 bytes), time= 2.3 seconds Test Name Raw Processed Evaluation BCFN(0+0,13-3,T) R= +6430 p = 2e-3043 FAIL !!!!!!!! BCFN(0+1,13-3,T) R=+15707 p = 3e-7434 FAIL !!!!!!!! BCFN(0+2,13-3,T) R=+10498 p = 1e-4968 FAIL !!!!!!!! ... mod3n(0):(0,9-0) R= +33.9 p = 1.3e-18 FAIL ! DC6-9x1Bytes-1 R=+477.5 p = 6.1e-304 FAIL !!!!!! DC6-6x2Bytes-1 R= +1872 p = 2e-1233 FAIL !!!!!!!! DC6-5x4Bytes-1 R= +1618 p = 1e-1085 FAIL !!!!!!!! Gap-16:A R=+483.3 p = 2.7e-422 FAIL !!!!!!! Gap-16:B R= +1032 p = 1.9e-918 FAIL !!!!!!! [Low1/8]BCFN(0+0,13-5,T) R=+352.1 p = 5.3e-138 FAIL !!!!! [Low1/8]BCFN(0+1,13-5,T) R= +18.4 p = 2.2e-7 suspicious [Low1/8]DC6-9x1Bytes-1 R= +40.6 p = 2.3e-21 FAIL !! [Low1/8]DC6-6x2Bytes-1 R= +81.9 p = 9.3e-50 FAIL !!!! [Low1/8]DC6-5x4Bytes-1 R= +38.3 p = 3.3e-21 FAIL !! [Low1/8]FPF-14+6/64:(0,14-3) R= +19.8 p = 3.7e-17 FAIL [Low1/8]FPF-14+6/64:(1,14-4) R= +12.7 p = 2.7e-10 very suspicious [Low1/8]FPF-14+6/64:(2,14-5) R= +9.4 p = 1.1e-7 mildly suspicious [Low1/8]FPF-14+6/64:(3,14-5) R= +17.1 p = 4.5e-14 FAIL [Low1/8]FPF-14+6/64:(6,14-8) R= +14.0 p = 3.8e-10 very suspicious [Low1/8]FPF-14+6/64:(7,14-8) R= +12.1 p = 9.2e-9 suspicious ... [Low8/64]FPF-14+6/4:(9,14-7) R= +12.8 p = 4.8e-10 very suspicious [Low8/64]FPF-14+6/4:all R=+485.3 p = 5.9e-455 FAIL !!!!!!! [Low8/64]FPF-14+6/4:cross R=+289.0 p = 8.7e-228 FAIL !!!!!! [Low8/64]Gap-16:A R= +20.5 p = 5.5e-17 FAIL ! [Low8/64]Gap-16:B R=+140.6 p = 4.6e-114 FAIL !!!!! ...and 661 test result(s) without anomalies

So, what's happening here? By choosing the worst case (changing just the highest bit),
uncommenting the `printf()`

calls in the code and commenting the `fwrite()`

calls
we obtain:

78b4c0c8b39829c8 b4c0c8b29829c84f 07ae88d0bbb18236 a707ae88c0bbb182 78be926ebaf6613d 0678be926fbaf661 96a5ef3c1b029f66 a5ef3c1f029f66ce 79eec016d1e02600 9579eec006d1e026 78f96c30b953ccb3 f96c30b153ccb319 83bc05c7ee753ca1 3583bc05d7ee753c

The repeated bit patterns in each pair of consecutive outputs are evident even to the naked eye: it is not surprising that so many tests fail. Any change in the initial state leaving intact a sufficiently large number of low bits will reproduce the problem: the repeated patterns will never go away.

Instead, generators using a linear engine such as `xoroshiro128++`

will decorrelate after a few outputs even if a single bit of
state had been flipped. This is what happens with two such generators initialized
with exactly the same seed of the two PCG generators:

5362d8a01671b995 d362d8a01670b995 0a97246a7e80a888 8a98346a7e81a88a c47b9b0e4dc38c7b 44769b0c35c48b79 4e442a6854dab254 cb883a5234d5f0d4 d6a29eb093c23736 da229bd8bbdb072c 79bb56117df4d1f9 76b2e38b66dfd211 39028d687f5b1025 09737341864c79a8

Larger-state linear generators might require more time, but they will eventually decorrelate. In fact, any sensible pseudorandom generator (not necessarily linear) will mix up quickly a state change (even a single-bit change) and provide an uncorrelated subsequence.

Wrap-up: **PCG generators contain a large number of pairs of non-overlapping correlated subsequences**.

# False claims

We are now discussing just single-sequence generators with `w`-bit output and `w` or 2`w`
bits of state. The other options have the severe statistical flaws we discussed.

## It is challenging to predict a PCG generator

This claim has appeared originally on Melissa O'Neill website and on her manuscript. There is no evidence for this claim. To me, it has been always evident that PCG generators are very easy to predict. Finally, in 2020 a group a INRIA took up the challenge and showed how to predict a PCG generator using standard cryptoanalytic techniques.

The paper takes a particularly strong variant and solves the problem for a generic increment constant. A couple
of years ago I had shown similar results for another variant, assuming a fixed constant: this program accepts as input the 64-bit state of a PCG
generator, generates three outputs, and recover the original state from the output, making it possible to predict
all future outputs of the generator.
Just compile with `-O3`

and run.

Writing the function that performs the
prediction, `recover()`

, took maybe half an hour of effort. It's a couple of loops, a couple of if's and
a few logical operations. Less than 10 lines of code.

> ./predpcg 0x333e2c3815b27604 Provided generator state: 333e2c3815b27604 First three outputs: cd9f107b, 8b817ffc, 7c12d316 Recovered generator state (from output): 333e2c3815b27604

Analogously this program accepts as input the 128-bit state of a PCG (same variant and assumptions)
generator, generates a few outputs, and recovers the original state from the output, making it possible again to predict
all future outputs of the generator. To compile it, you will need Victor Shoup's amazing NTL library.
The program uses the same logic of the 64-bit case (and of the INRIA paper)—guessing exhaustively a few bits, deriving a lot of other bits,
and solving a simple modular equation. However, in the 64-bit case the equation can be solved by trying all possible solutions,
whereas in this case we use a standard technique based on lattice reduction: as a result, discovering the
initial state takes usually *less* time than in the 64-bit case (in fact, the computation time can be
brought down to well below a second if you are willing to examine more outputs).

By the same token, it is easy to set the state of a PCG generator so that it outputs a string of
your choice. This program, for example, forces a PCG generator with 128 bits of state to output the string `> John D. Cook <`

.
If you pass its output through `hexdump -C`

, you'll see

00000000 98 06 19 c7 65 5b ce 68 f2 41 47 84 50 cf ba fa |....e[.h.AG.P...| 00000010 a9 eb 2d 00 67 a3 34 af 5a e7 70 31 4b ae a3 38 |..-.g.4.Z.p1K..8| 00000020 03 98 b2 b5 39 0d 05 e3 98 db 33 9f b7 d4 9d b7 |....9.....3.....| 00000030 2c 29 12 34 52 66 ce b7 01 ca 96 3f f3 eb cf 7a |,).4Rf.....?...z| 00000040 d9 76 81 e9 36 e7 06 2b c6 94 0c 66 d0 96 d6 82 |.v..6..+...f....| 00000050 5f b1 c6 18 50 24 19 64 db 0a de 7b 27 28 ab 81 |_...P$.d...{'(..| 00000060 0f 31 0b 5c 37 bd 10 ec 1e 04 da ae 18 ce 9d 4d |.1.\7..........M| 00000070 ff 5c fd 43 fd e6 24 70 23 94 8f 8b 41 0a 89 eb |.\.C..$p#...A...| 00000080 3e 20 4a 6f 68 6e 20 44 2e 20 43 6f 6f 6b 20 3c |> John D. Cook <| 00000090 03 f6 4e 49 4a 39 fa 15 e1 3c 9f e7 bc 78 a9 c0 |..NIJ9...<...x..| 000000a0 ea ea e2 46 65 65 63 b5 81 1b 76 01 c9 28 8b 6d |...Feec...v..(.m| 000000b0 ec a2 0c a4 6b e1 33 d0 55 6f 8a db 49 73 a7 38 |....k.3.Uo..Is.8| 000000c0 6d 33 8c 5b 9a 88 39 ff 70 90 ff 8f 5d 0a a6 75 |m3.[..9.p...]..u| 000000d0 dd d6 2f 5c 44 bc a2 af 71 17 8a d2 f0 a0 cf da |../\D...q.......| 000000e0 f2 3b c5 7b 51 dc 75 50 0f 50 79 8a 5b 9b 7b c4 |.;.{Q.uP.Py.[.{.|

Wrap-up: **PCG generators are easy to predict**.

## PCG generators are fast

They're not. 128-bit operations are slow. On my hardware a 128-bit PCG generators take 2.75ns to
emit an integer, against 0.95ns for `xoroshiro128++`

. The PCG generator is almost *three times slower*.
And they both pass all statistical tests.

You might want to measure the speed on your hardware: just download the harness and the
benchmark, compile the latter with `gcc -O3 -fno-move-loop-invariants -fno-unroll-loops`

and execute with at least a billion repetitions
(the number of repetitions is the only parameter to the harness). You can also
download the benchmark for `xoroshiro128++`

and compare. But the idea that manipulating
128-bit numbers can be faster than performing a few 64-bit shifts,
rotations, xors and sums is ridiculous. The PCG generators with the
same number of state and input bits are slightly faster, but still
slower than a `xoroshiro128++`

generator (1.46ns).

Said that, I will repeat again: *you have to measure the speed of your PRNG inside your application*. These
figures are just an indication. Even better, you might want to check the
results of the Intel®
Architecture Code Analyzer, which reports 3.50 cycles for `xoroshiro128++`

,
and a whopping 9.53 cycles for a 128-bit PCG generator.

Wrap-up: **while PCG generators are not terribly slow, they are very far from the sub-ns
performance of fast scrambled linear generators**.

# Conclusions

Wrap-up: **There is technically no sensible reason to use a PCG generator: those without flaws are not competitive**.

However, you might wish to use at all costs, for some reason, an LCG-based PRNG with 64 bits of output and 128 bits of state.

In that case, I suggest that you do the simplest thing: take
a good 128-bit LCG (the multiplier is taken from a paper I recently co-authored with Guy Steele) and then apply to
the high bits a good, standard mixing function, like the variant
of the MurmurHash3 finalization step used by
Java's SplittableRandom: we
could call this brand new generator `LCG128Mix`

.

#include <stdint.h> __uint128_t x; uint64_t inline next(void) { // Put in z the top bits of state uint64_t z = x >> 64; // Update state (multiplier from https://arxiv.org/abs/2001.05304) x = x * ((__uint128_t)1 << 64 ^ 0xd605bbb58c8abbfd) + 0x14057b7ef767814f; // Compute mix z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9; z = (z ^ (z >> 27)) * 0x94d049bb133111eb; return z ^ (z >> 31); }

This generator is faster than a 128-bit PCG generator: just 2.16ns per word (but benchmark on your hardware). You'll have a very strong 128-bit LCG-based generator, and you'll be standing on the shoulders of giants. Still, states differing just by the higher bit will generate correlated sequences—even the MurmurHash3 mixing function cannot hide that.

However, if what you want is a high-quality generator with 64 bits of output and 128 bits of state based on integer arithmetic, and 128-bit multiplications are available, you have a much better option: a generalized version of Marsaglia's Multiply-With-Carry generators defined by Goresky and Klapper, for which I computed good constants (note that the constants have been updated in January 2021):

uint64_t x = 0, c = 1; // Not all zeros uint64_t inline next() { const __uint128_t t = 0xff8fa3db04bb588e * (__uint128_t)x + c; x = 0xd81fdde4eba3aae9 * (uint64_t)t; c = (t + 0xadca32a7 * (__uint128_t)x) >> 64; return x; }

This generator has period ≈2^{127}, and it is equivalent to a linear congruential generator with prime modulus 339698960761121441142761164663671108263 (0xff8fa3db04bb588e00000000adca32a7) and multiplier 2^{64},
but, thanks to the Multiply-With-Carry design, you do not really compute any integer remainder. There are two orbits of size ≈2^{127}, and after very few steps you will fall into
one, so any initial state with `x`

and `c`

not both zero will work. This generator is as fast as the previous one (just 2.16ns per word, but benchmark on your hardware),
it has all the excellent properties of a linear congruential generator with prime modulus and good spectral scores,
and moreover sequences will be uncorrelated except in very special cases—when the initial states of the associated linear congruential
generator differ by a small multiplicative constant; but these cases can be easily avoided, for example, by seeding the carry with a fixed constant. The
results of the Intel®
Architecture Code Analyzer are also very good.

If you need more state, the same design can be extended to a larger number of words:

uint64_t x, y, z, c = 1; // Not all zeros uint64_t inline next() { const __uint128_t t = 0xff2a4b18846bbee2 * (__uint128_t)x + c; x = y; y = z; z = 0x94d34db4cd59d099 * (uint64_t)t; c = (t + 0x96e36616f07c57 * (__uint128_t)z) >> 64; return z; }

This generator has period ≈2^{255}, and it is equivalent to a linear congruential generator with prime modulus
115414502257413896964950864333521986835826833770103090537364309614338189065303 (0xff2a4b18846bbee2000000000000000000000000000000000096e36616f07c57) and multiplier 2^{64}.
It is only marginally slower than the previous one as the two additional move between variables have very little impact.