I was working on a little simulation using random numbers and found I was getting odd results from my query. After spending a while dissecting it trying to find my error I found the problem was in the database itself. I was able to replicate the problem on Live SQL using a simple test case with both 19c and 23ai.
The example code is intended to roll 8 dice two times each. If the first die is below 4 then I will use the result of the second die. Otherwise, I will keep the first roll. The problem is that the CASE statement always returns the second die, regardless of the first die result. The two links above will let you run the example yourself, but I will include the code and output here.
First, I set a seed. The seed value itself isn’t important; but picking one allows everyone to get the same results from the pseudo-random number generator in DBMS_RANDOM.
Then, my query generates eight rows of 2 random values each. The rows are numbered in order by the result of the first die. The CASE statement should resolve the first vs second die results. However, you will see on line 7, the roll was a 5, which should be kept; but the CASE returned the reroll value instead. Then, on line 8, the first roll was a 6, and again, the CASE returned the reroll instead of using that.
SQL> BEGIN 2 DBMS_RANDOM.seed(0); 3 END; 4 / PL/SQL procedure successfully completed. SQL> SELECT ROW_NUMBER() OVER (ORDER BY first_roll) rn, 2 first_roll, 3 reroll, 4 CASE WHEN first_roll < 4 THEN reroll ELSE first_roll END final 5 FROM ( 6 SELECT FLOOR(DBMS_RANDOM.VALUE * 6) + 1 first_roll, FLOOR(DBMS_RANDOM.VALUE * 6) + 1 reroll 7 FROM DUAL 8 CONNECT BY LEVEL <= 8 9 ) 10 ORDER BY rn; RN FIRST_ROLL REROLL FINAL ---------- ---------- ---------- ---------- 1 1 5 5 2 2 2 2 3 3 1 1 4 3 6 6 5 3 5 5 6 3 2 2 7 5 3 3 8 6 5 5 8 rows selected.
I posted this to an Oracle ACE chat to see if anyone else had ever seen anything like this. The feedback I got was that it looked like the SQL transformation step was "over-zealous" in trying to simplify the expression.
Looking at the inner query, first_roll and reroll are identical.
They are both the expression FLOOR(DBMS_RANDOM.VALUE * 6) + 1.
So, if we call that expression X, the CASE could then be interpreted as
case when X < 4 then X else X end
Since the case returns X no matter what the condition is, then it makes sense to simplify to just return X. In this case the X is the reroll.
Without knowing exact the internals of how that transfomation works, I'm going to assume the reroll value is always used because it is first return value of the case statement. To test that theory, I will invert the logic. Instead of checking to use the reroll if the first_roll is less than 4, I will check to keep the first_roll if it is greater than or equal to 4. Functionally it should be the same if there is no bug; but in this case I'm trying to test the bug itself.
SQL> SELECT ROW_NUMBER() OVER (ORDER BY first_roll) rn, 2 first_roll, 3 reroll, 4 CASE WHEN first_roll >= 4 THEN first_roll else reroll END final 5 FROM ( 6 SELECT FLOOR(DBMS_RANDOM.VALUE * 6) + 1 first_roll, FLOOR(DBMS_RANDOM.VALUE * 6) + 1 reroll 7 FROM DUAL 8 CONNECT BY LEVEL <= 8 9 ) 10 ORDER BY rn; RN FIRST_ROLL REROLL FINAL ---------- ---------- ---------- ---------- 1 1 1 1 2 1 5 1 3 2 5 2 4 3 3 3 5 4 5 4 6 4 5 4 7 5 2 5 8 6 5 6 8 rows selected.
It looks like my assumption was correct. With the first roll being the returned value of the first WHEN condition - that ends up being used, regardless of whether the WHEN condition is true or not.
A bug report is being filed internally with Oracle to look into this; but until a fix is made available I rewrote the query like this:
SQL> SELECT rn, first_roll, reroll, NVL(reroll, first_roll) final 2 FROM (SELECT ROW_NUMBER() OVER (ORDER BY first_roll) rn, 3 first_roll, 4 CASE WHEN first_roll < 4 THEN FLOOR(DBMS_RANDOM.VALUE * 6) + 1 END reroll 5 FROM ( 6 SELECT FLOOR(DBMS_RANDOM.VALUE * 6) + 1 first_roll 7 FROM DUAL 8 CONNECT BY LEVEL <= 8 9 )) 10 ORDER BY rn; RN FIRST_ROLL REROLL FINAL ---------- ---------- ---------- ---------- 1 1 6 6 2 1 5 5 3 2 3 3 4 2 2 2 5 3 3 3 6 3 5 5 7 5 5 8 6 6 8 rows selected.
Now, instead of rolling both dice every time. I only reroll if the first roll warrants it. Thus I have NULL for a reroll if the first_roll is 4 or higher and a reroll value if the first_roll is 3 or lower.
Then in the outer-most query, I use NVL to resolve the difference. Now, there is never a point in the query where transformation logic will be looking at two values that are defined by the same expression.
I had some initial doubts of whether this was a bug or simply a misunderstanding. If my expression is deterministic, then it might make sense for the transformation to replace my case with a single value; because a deterministic expression will always return the same thing, it won't matter which value of two identical expressions is used, because they are guaranteed to return the same value.
But, with non-deterministic functions (which dbms_random.value certainly qualifies as) that transformation is no longer valid. The syntax of each expression is the same; but because first_roll and reroll are nondeterministic they must be treated as distinct entities. Thus the transformation's simplification is erroneous.
At first glance, failing a simple less-than check may seem like an egregious mistake with the parsing/transforming steps. Upon review though, I think it's a rare scenario where a query would be constructed as I did. That is, querying a case statement over a sub-query of identical, non-deterministic expressions. While it made sense for the purposes of my simulation, most business cases will likely be different.
I hope this helps if someone else runs into a similar transformation error. It was an interesting exercise to dig through and I'm glad I have a network of people to bounce ideas off. Thank you for reading. Questions and comments, as always, are welcome.