A long time ago I ported the Mersenne Twister pseudo-random number generator to pl/sql. That algorithm, and my port, are showing their age so I started looking at other generators. A newer approach is found in the xoshiro/xoroshiro family by David Blackman and Sebastiano Vigna found here.
I’ve chosen xoshiro256** as one of their “all-purpose, rock-solid
generators”. Like the Mersenne Twister, the original source is written in c and utilizes numerous bit manipulations to generate new values from the prior ones.
While PL/SQL has the BITAND function, it doesn’t have any others that directly apply to numeric types. It would have been possible to use RAW values and the UTL_RAW package, but I found it easier to keep true to the original implementation by creating my own bit-wise OR, XOR, shift, and rotation functions.
Implementing 64-bit integer subtype (almost)
Another complexity is that c has the uint64_t type. That is, a 64-bit unsigned integer. While PL/SQL NUMBER types can hold such values, their is no direct correlation. Ideally I would create an integer subtype constrained with a RANGE clause of 0..18446744073709551615 (2^64-1). However, there is no legal syntax to support that definition. As of 23ai, the RANGE clause only applies to subtypes of PLS_INTEGER which is, itself, limited to signed 32-bit values. So I settled for a 20-digit integer using:
SUBTYPE uint64_t IS NUMBER(20, 0)
Then, in my calculations I apply a 64-bit mask to ensure overflows are handled in a way that mimics the c uint64_t.
Bit-wise operators
As mentioned above, some of the c syntax has no corresponding pl/sql operator or function. In particular:
- | – OR
- ^ – XOR
- << – Shift left
- >> – Shift right
Fortunately, BITAND and arithmetic is sufficient to create OR and XOR. Multiplying or dividing by powers of 2 will replicate bit shifts left or right respectively.
FUNCTION bitor(p_a IN uint64_t, p_b IN uint64_t) RETURN uint64_t IS BEGIN RETURN BITAND(p_a + p_b - BITAND(p_a, p_b), c_64bitmask); END bitor; FUNCTION bitxor(p_a IN uint64_t, p_b IN uint64_t) RETURN uint64_t IS BEGIN RETURN BITAND((p_a + p_b) - (BITAND(p_a, p_b) * 2), c_64bitmask); END bitxor; FUNCTION shl64(p_n IN uint64_t, p_bits IN bits_t) RETURN uint64_t IS BEGIN -- Apply 64-bit mask to ensure results are within proper range RETURN CASE p_bits WHEN 0 THEN p_n ELSE BITAND(p_n * (2 ** p_bits), c_64bitmask) END; END shl64; FUNCTION shr64(p_n IN uint64_t, p_bits IN bits_t) RETURN uint64_t IS BEGIN -- Apply 64-bit mask to ensure results are within proper range RETURN CASE p_bits WHEN 0 THEN p_n ELSE BITAND(p_n / (2 ** p_bits), c_64bitmask) END; END shr64;
It wasn’t technically necessary to use a CASE for shifts of 0 bits, but it seemed more natural to capture that special case rather than force the extra math steps to produce a result we already knew. This is merely a preference on my part and both shl64 and shr64 could be simplified to let 0 run the same as any other shift value. Also of note, both of the shift functions apply the 64-bit mask to ensure there the results always fall in the intended 0-18446744073709551615 range.
The last special bit operation is rotation, the xoshiro code itself includes the rotl logic which can then be converted to PL/SQL using the functions above:
FUNCTION rotl(p_x uint64_t, p_k IN bits_t) RETURN uint64_t IS -- static inline uint64_t rotl(const uint64_t x, int k) { -- return (x << k) | (x >> (64 - k)); -- } BEGIN RETURN CASE WHEN p_k = 0 THEN p_x ELSE bitor(shl64(p_x, p_k), shr64(p_x, 64 - p_k)) END; END rotl;
Pulling it all together
With the bit operations set, implementing the functions was fairly straight forward. There are multiple arrays used within the original c code which I originally implemented as varrays, but it felt artificial to keep adjusting the varray indexes to be 1-based vs the 0-based arrays. So, instead, I decided to use an associative arrays with indexes limited to 0,1,2,3. The full package follows:
CREATE OR REPLACE PACKAGE xoshiro256 IS -- Based on xoshiro256** by by David Blackman and Sebastiano Vigna -- original https://prng.di.unimi.it/xoshiro256starstar.c -- .///. -- (0 o) ---------------0000--(_)--0000--------------- -- -- Sean D. Stuber -- sean.stuber@gmail.com -- -- oooO Oooo --------------( )-----( )--------------- -- \ ( ) / -- \_) (_/ -- It would be nice to define an explicit RANGE 0.. 18446744073709551615 for the subtype -- but as of 23ai, only subtypes of PLS_INTEGER may have a range. SUBTYPE uint64_t IS NUMBER(20, 0); TYPE seed_tab IS TABLE OF uint64_t; PROCEDURE set_state(p_seed IN seed_tab); FUNCTION get_next RETURN uint64_t; PROCEDURE jump; PROCEDURE long_jump; END;
CREATE OR REPLACE PACKAGE BODY sds.xoshiro256 IS -- TO_NUMBER('ffffffffffffffff', 'xxxxxxxxxxxxxxxx') = 18446744073709551615 = 2^64-1 c_64bitmask CONSTANT uint64_t := POWER(2, 64) - 1; -- 18446744073709551615 SUBTYPE bits_t IS PLS_INTEGER RANGE 0 .. 63; SUBTYPE array_index_t IS PLS_INTEGER RANGE 0 .. 3; TYPE state_array_t IS TABLE OF uint64_t INDEX BY array_index_t; g_state state_array_t; FUNCTION bitor(p_a IN uint64_t, p_b IN uint64_t) RETURN uint64_t IS BEGIN RETURN BITAND(p_a + p_b - BITAND(p_a, p_b), c_64bitmask); END bitor; FUNCTION bitxor(p_a IN uint64_t, p_b IN uint64_t) RETURN uint64_t IS BEGIN RETURN BITAND((p_a + p_b) - (BITAND(p_a, p_b) * 2), c_64bitmask); END bitxor; FUNCTION shl64(p_n IN uint64_t, p_bits IN bits_t) RETURN uint64_t IS BEGIN -- Apply 64-bit mask to ensure results are within proper range RETURN CASE p_bits WHEN 0 THEN p_n ELSE BITAND(p_n * (2 ** p_bits), c_64bitmask) END; END shl64; FUNCTION shr64(p_n IN uint64_t, p_bits IN bits_t) RETURN uint64_t IS BEGIN -- Apply 64-bit mask to ensure results are within proper range RETURN CASE p_bits WHEN 0 THEN p_n ELSE BITAND(p_n / (2 ** p_bits), c_64bitmask) END; END shr64; FUNCTION rotl(p_x uint64_t, p_k IN bits_t) RETURN uint64_t IS -- static inline uint64_t rotl(const uint64_t x, int k) { -- return (x << k) | (x >> (64 - k)); -- } BEGIN RETURN CASE WHEN p_k = 0 THEN p_x ELSE bitor(shl64(p_x, p_k), shr64(p_x, 64 - p_k)) END; END rotl; FUNCTION get_next RETURN uint64_t IS -- uint64_t next(void) { -- const uint64_t result = rotl(s[1] * 5, 7) * 9; -- -- const uint64_t t = s[1] << 17; -- -- s[2] ^= s[0]; -- s[3] ^= s[1]; -- s[1] ^= s[2]; -- s[0] ^= s[3]; -- -- s[2] ^= t; -- -- s[3] = rotl(s[3], 45); -- -- return result; -- } v_result uint64_t := BITAND(rotl(BITAND(g_state(1) * 5, c_64bitmask), 7) * 9, c_64bitmask); v_t uint64_t := shl64(g_state(1), 17); BEGIN g_state(2) := bitxor(g_state(2), g_state(0)); g_state(3) := bitxor(g_state(3), g_state(1)); g_state(1) := bitxor(g_state(1), g_state(2)); g_state(0) := bitxor(g_state(0), g_state(3)); g_state(2) := bitxor(g_state(2), v_t); g_state(3) := rotl(g_state(3), 45); RETURN v_result; END get_next; PROCEDURE jump IS -- /* This is the jump function for the generator. It is equivalent -- to 2^128 calls to next(); it can be used to generate 2^128 -- non-overlapping subsequences for parallel computations. */ -- -- void jump(void) { -- static const uint64_t JUMP[] = { 0x180ec6d33cfd0aba, 0xd5a61266f0c9392c, 0xa9582618e03fc9aa, 0x39abdc4529b1661c }; -- -- uint64_t s0 = 0; -- uint64_t s1 = 0; -- uint64_t s2 = 0; -- uint64_t s3 = 0; -- for(int i = 0; i < sizeof JUMP / sizeof *JUMP; i++) -- for(int b = 0; b < 64; b++) { -- if (JUMP[i] & UINT64_C(1) << b) { -- s0 ^= s[0]; -- s1 ^= s[1]; -- s2 ^= s[2]; -- s3 ^= s[3]; -- } -- next(); -- } -- -- s[0] = s0; -- s[1] = s1; -- s[2] = s2; -- s[3] = s3; -- } v_jump CONSTANT state_array_t := state_array_t(0 => TO_NUMBER('180ec6d33cfd0aba', 'xxxxxxxxxxxxxxxx'), 1 => TO_NUMBER('d5a61266f0c9392c', 'xxxxxxxxxxxxxxxx'), 2 => TO_NUMBER('a9582618e03fc9aa', 'xxxxxxxxxxxxxxxx'), 3 => TO_NUMBER('39abdc4529b1661c', 'xxxxxxxxxxxxxxxx')) ; v_s0 uint64_t := 0; v_s1 uint64_t := 0; v_s2 uint64_t := 0; v_s3 uint64_t := 0; v_dummy uint64_t; BEGIN FOR i IN 0 .. 3 LOOP FOR b IN 0 .. 63 LOOP IF BITAND(v_jump(i), shl64(1, b)) != 0 THEN v_s0 := bitxor(v_s0, g_state(0)); v_s1 := bitxor(v_s1, g_state(1)); v_s2 := bitxor(v_s2, g_state(2)); v_s3 := bitxor(v_s3, g_state(3)); END IF; v_dummy := get_next; END LOOP; END LOOP; g_state(0) := v_s0; g_state(1) := v_s1; g_state(2) := v_s2; g_state(3) := v_s3; END jump; PROCEDURE long_jump IS -- /* This is the long-jump function for the generator. It is equivalent to -- 2^192 calls to next(); it can be used to generate 2^64 starting points, -- from each of which jump() will generate 2^64 non-overlapping -- subsequences for parallel distributed computations. */ -- -- void long_jump(void) { -- static const uint64_t LONG_JUMP[] = { 0x76e15d3efefdcbbf, 0xc5004e441c522fb3, 0x77710069854ee241, 0x39109bb02acbe635 }; -- -- uint64_t s0 = 0; -- uint64_t s1 = 0; -- uint64_t s2 = 0; -- uint64_t s3 = 0; -- for(int i = 0; i < sizeof LONG_JUMP / sizeof *LONG_JUMP; i++) -- for(int b = 0; b < 64; b++) { -- if (LONG_JUMP[i] & UINT64_C(1) << b) { -- s0 ^= s[0]; -- s1 ^= s[1]; -- s2 ^= s[2]; -- s3 ^= s[3]; -- } -- next(); -- } -- -- s[0] = s0; -- s[1] = s1; -- s[2] = s2; -- s[3] = s3; -- } v_longjump CONSTANT state_array_t := state_array_t(0 => TO_NUMBER('76e15d3efefdcbbf', 'xxxxxxxxxxxxxxxx'), 1 => TO_NUMBER('c5004e441c522fb3', 'xxxxxxxxxxxxxxxx'), 2 => TO_NUMBER('77710069854ee241', 'xxxxxxxxxxxxxxxx'), 3 => TO_NUMBER('39109bb02acbe635', 'xxxxxxxxxxxxxxxx')) ; v_s0 uint64_t := 0; v_s1 uint64_t := 0; v_s2 uint64_t := 0; v_s3 uint64_t := 0; v_dummy uint64_t; BEGIN FOR i IN 0 .. 3 LOOP FOR b IN 0 .. 63 LOOP IF BITAND(v_longjump(i), shl64(1, b)) != 0 THEN v_s0 := bitxor(v_s0, g_state(0)); v_s1 := bitxor(v_s1, g_state(1)); v_s2 := bitxor(v_s2, g_state(2)); v_s3 := bitxor(v_s3, g_state(3)); END IF; v_dummy := get_next; END LOOP; END LOOP; g_state(0) := v_s0; g_state(1) := v_s1; g_state(2) := v_s2; g_state(3) := v_s3; END long_jump; PROCEDURE set_state(p_seed IN seed_tab) IS BEGIN g_state := state_array_t(); FOR i IN 0 .. 3 LOOP g_state(i) := p_seed(i + 1); END LOOP; END set_state; END;
Picking a set of good seed values help build better distributions in the generated sequence. For the purposes of illustration though, I’ll use a simple 12345 in each of the seed array elements and then generate 10 values with get_next, then jump and generate 10 more and finally do a long_jump and generate a final set of 10. This simple framework allowed me to verify my logic against the original doing the same steps.
DECLARE x xoshiro256.uint64_t; BEGIN xoshiro256.set_state(xoshiro256.seed_tab(12345, 12345, 12345, 12345)); FOR i IN 0 .. 9 LOOP x := xoshiro256.get_next; DBMS_OUTPUT.put_line(TO_CHAR(i, '9') || ' ' || LPAD(TO_CHAR(x), 20) || ' ' || TO_CHAR(x, 'xxxxxxxxxxxxxxxx')); END LOOP; DBMS_OUTPUT.put_line('--- jump ---'); xoshiro256.jump; FOR i IN 0 .. 9 LOOP x := xoshiro256.get_next; DBMS_OUTPUT.put_line(TO_CHAR(i, '9') || ' ' || LPAD(TO_CHAR(x), 20) || ' ' || TO_CHAR(x, 'xxxxxxxxxxxxxxxx')); END LOOP; DBMS_OUTPUT.put_line('--- long jump ---'); xoshiro256.long_jump; FOR i IN 0 .. 9 LOOP x := xoshiro256.get_next; DBMS_OUTPUT.put_line(TO_CHAR(i, '9') || ' ' || LPAD(TO_CHAR(x), 20) || ' ' || TO_CHAR(x, 'xxxxxxxxxxxxxxxx')); END LOOP; END;
This produced the following sequences…
0 71107200 43d0280 1 71107200 43d0280 2 9320162918400 87a05000000 3 9320234025600 87a093d0280 4 12773345438245847175 b1440a0000000087 5 12768617581213858983 b1333e0a010f3ca7 6 8945543092777141728 7c24f40c1fee45e0 7 470016407425146078 685d56eab2e5cde 8 3493524090943047400 307b7da6b6ee06e8 9 5886979323815290452 51b2c08124146654 --- jump --- 0 4581861990845984958 3f960b34763d30be 1 6555207914207083891 5af8c6ddf3247d73 2 1155739510168040853 100a02f60c733595 3 17756807916997691290 f66cdab88d7dc79a 4 18120637618276044033 fb796fe814ec8d01 5 3601654045701872973 31fba545ade89d4d 6 1200605674697995402 10a9688003b1348a 7 2838098080646629052 2762f32adea906bc 8 13449286687953663012 baa576fc3dde3424 9 5139522055796585030 47534014530a7246 --- long jump --- 0 308148744041885595 446c3a66a996f9b 1 1534996402541018053 154d671868f5bbc5 2 2575119349696104759 23bca967acdf0937 3 1204256562158620046 10b660f67de5cd8e 4 2485655289488877646 227ed250f40e284e 5 2705313921121906326 258b34ad8a63ee96 6 14047681829789752210 c2f364324653db92 7 8712759958851592880 78e9f1053898a2b0 8 16323705335360483007 e28973840c4f6ebf 9 2581999948191718881 23d51b45da0b5de1
This was a fun little exercise. Eventually I’d like to expand on it to implement other variations in the xoshiro/xoroshiro suite as well as include support for splitmix64 as a way to help start with better seed values.
Questions and comments, as always, are welcome.