Close

Oracle optimizer doesn’t understand elementary math

If A < B and B < C then A < C

Right? We all know the transitive rule… all of us except the Oracle optimizer.
Here’s a seasonal example to illustrate how the Optimizer changes it’s methods based on seemingly trivial options.

Let’s say I bake 10 types of cookies and then I want to see how many distinct bundles of 3 types I can make.
Mathematically we’d represent that as C(10,3) = 120.

In SQL, we could solve for them like this:

CREATE TABLE cookies
AS
SELECT 'oatmeal' cookie FROM DUAL
UNION ALL
SELECT 'm&m' cookie FROM DUAL
UNION ALL
SELECT 'chocolate chip' cookie FROM DUAL
UNION ALL
SELECT 'molasses' cookie FROM DUAL
UNION ALL
SELECT 'gingerbread' cookie FROM DUAL
UNION ALL
SELECT 'macadamia nut' cookie FROM DUAL
UNION ALL
SELECT 'peanut butter' cookie FROM DUAL
UNION ALL
SELECT 'snickerdoodle' cookie FROM DUAL
UNION ALL
SELECT 'sugar cookie' cookie FROM DUAL
UNION ALL
SELECT 'shortbread' cookie FROM DUAL;

EXEC dbms_stats.gather_table_stats(ownname=>user,tabname=>'COOKIES');

Now run a couple of test queries to check our math.

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie < b.cookie AND b.cookie < c.cookie;

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie < b.cookie AND b.cookie < c.cookie AND a.cookie < c.cookie;

Note, both queries return the expected 120 rows.
However, the optimizer came up with 2 different plans for these functionally identical queries.

With implied transitivity

--------------------------------------------------
| Id  | Operation             | Name    | E-Rows |
--------------------------------------------------
|   0 | SELECT STATEMENT      |         |    203 |
|   1 |  MERGE JOIN           |         |    203 |
|   2 |   SORT JOIN           |         |     45 |
|   3 |    MERGE JOIN         |         |     45 |
|   4 |     SORT JOIN         |         |     10 |
|   5 |      TABLE ACCESS FULL| COOKIES |     10 |
|*  6 |     SORT JOIN         |         |     10 |
|   7 |      TABLE ACCESS FULL| COOKIES |     10 |
|*  8 |   SORT JOIN           |         |     10 |
|   9 |    TABLE ACCESS FULL  | COOKIES |     10 |
--------------------------------------------------

With explicit transitivity

-------------------------------------------------
| Id  | Operation            | Name    | E-Rows |
-------------------------------------------------
|   0 | SELECT STATEMENT     |         |     91 |
|   1 |  MERGE JOIN          |         |     91 |
|   2 |   MERGE JOIN         |         |     45 |
|   3 |    SORT JOIN         |         |     10 |
|   4 |     TABLE ACCESS FULL| COOKIES |     10 |
|*  5 |    SORT JOIN         |         |     10 |
|   6 |     TABLE ACCESS FULL| COOKIES |     10 |
|*  7 |   FILTER             |         |        |
|*  8 |    SORT JOIN         |         |     10 |
|   9 |     TABLE ACCESS FULL| COOKIES |     10 |
-------------------------------------------------

We can take it a step futher, let’s say I want all of the cookies to be the same type.
This is obviously going to be just 10 different combinations and no joins are needed; but I’ll write it this way to further illustrate the point.

With implied transitivity

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie = b.cookie AND b.cookie = c.cookie;

------------------------------------------------
| Id  | Operation           | Name    | E-Rows |
------------------------------------------------
|   0 | SELECT STATEMENT    |         |     10 |
|*  1 |  HASH JOIN          |         |     10 |
|*  2 |   HASH JOIN         |         |     10 |
|   3 |    TABLE ACCESS FULL| COOKIES |     10 |
|   4 |    TABLE ACCESS FULL| COOKIES |     10 |
|   5 |   TABLE ACCESS FULL | COOKIES |     10 |
------------------------------------------------

With explicit transitivity

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie = b.cookie AND b.cookie = c.cookie AND a.cookie = c.cookie;

------------------------------------------------
| Id  | Operation           | Name    | E-Rows |
------------------------------------------------
|   0 | SELECT STATEMENT    |         |      1 |
|*  1 |  HASH JOIN          |         |      1 |
|*  2 |   HASH JOIN         |         |     10 |
|   3 |    TABLE ACCESS FULL| COOKIES |     10 |
|   4 |    TABLE ACCESS FULL| COOKIES |     10 |
|   5 |   TABLE ACCESS FULL | COOKIES |     10 |
------------------------------------------------

Again note the change in estimated rows for an evaluation that we might assume the optimizer could figure out on its own.

The optimizer doesn’t understand transitivity of columns; but what if we add variables or constants?
Going back to the original query and take a closer look:

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie < b.cookie AND b.cookie < c.cookie;

The plan’s predicates include:

Predicate Information (identified by operation id):
---------------------------------------------------
 
   6 - access(INTERNAL_FUNCTION("A"."COOKIE")<INTERNAL_FUNCTION("B"."COOKIE"))
       filter(INTERNAL_FUNCTION("A"."COOKIE")<INTERNAL_FUNCTION("B"."COOKIE"))
   8 - access("B"."COOKIE"<"C"."COOKIE")
       filter("B"."COOKIE"<"C"."COOKIE")

But, if we add a variable, obviously this changes the results, but it’s also interesting to note how the SQL condition becomes interpreted as filter conditions within the optimizer’s plan predicates.

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie < b.cookie AND b.cookie < c.cookie and a.cookie= :b1;

------------------------------------------------
| Id  | Operation           | Name    | E-Rows |
------------------------------------------------
|   0 | SELECT STATEMENT    |         |      1 |
|   1 |  NESTED LOOPS       |         |      1 |
|   2 |   NESTED LOOPS      |         |      1 |
|*  3 |    TABLE ACCESS FULL| COOKIES |      1 |
|*  4 |    TABLE ACCESS FULL| COOKIES |      1 |
|*  5 |   TABLE ACCESS FULL | COOKIES |      1 |
------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("A"."COOKIE"=:B1)
   4 - filter("B"."COOKIE">:B1 AND "A"."COOKIE"<"B"."COOKIE")
   5 - filter("C"."COOKIE">:B1 AND "B"."COOKIE"<"C"."COOKIE")

The variable does get propagated through to the other conditions as we might expect.
Now, let’s try adding a literal value and we’ll see Oracle again is able to derive transitivity and propagates the constant literal through to all of the table predicates and radically alters the estimates and execution plan

SELECT *
FROM cookies a, cookies b, cookies c
WHERE a.cookie < b.cookie AND b.cookie < c.cookie and a.cookie= 'gingerbread';

--------------------------------------------------
| Id  | Operation             | Name    | E-Rows |
--------------------------------------------------
|   0 | SELECT STATEMENT      |         |     14 |
|   1 |  MERGE JOIN           |         |     14 |
|   2 |   MERGE JOIN CARTESIAN|         |      8 |
|*  3 |    TABLE ACCESS FULL  | COOKIES |      1 |
|   4 |    BUFFER SORT        |         |      8 |
|*  5 |     TABLE ACCESS FULL | COOKIES |      8 |
|*  6 |   FILTER              |         |        |
|*  7 |    SORT JOIN          |         |      8 |
|*  8 |     TABLE ACCESS FULL | COOKIES |      8 |
--------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter("A"."COOKIE"='gingerbread')
   5 - filter("C"."COOKIE">'gingerbread')
   6 - filter("B"."COOKIE"<"C"."COOKIE")
   7 - access("A"."COOKIE"<"B"."COOKIE")
       filter("A"."COOKIE"<"B"."COOKIE")
   8 - filter("B"."COOKIE">'gingerbread')

But, even though it now seems to understands the transitivity, that understanding is still flawed, because if we put the implicit condition of a.cookie < c.cookie, which offers no functional change, we still get different plans with different joins and estimates.

SELECT *
  FROM cookies a, cookies b, cookies c
 WHERE a.cookie < b.cookie
   AND b.cookie < c.cookie
   AND a.cookie < c.cookie
   AND a.cookie = 'gingerbread';
   
-------------------------------------------------
| Id  | Operation            | Name    | E-Rows |
-------------------------------------------------
|   0 | SELECT STATEMENT     |         |      8 |
|   1 |  MERGE JOIN          |         |      8 |
|   2 |   NESTED LOOPS       |         |      4 |
|*  3 |    TABLE ACCESS FULL | COOKIES |      1 |
|*  4 |    TABLE ACCESS FULL | COOKIES |      4 |
|*  5 |   FILTER             |         |        |
|*  6 |    SORT JOIN         |         |      8 |
|*  7 |     TABLE ACCESS FULL| COOKIES |      8 |
-------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter("A"."COOKIE"='gingerbread')
   4 - filter("A"."COOKIE"<"C"."COOKIE" AND "C"."COOKIE">'gingerbread')
   5 - filter("B"."COOKIE"<"C"."COOKIE")
   6 - access("A"."COOKIE"<"B"."COOKIE")
       filter("A"."COOKIE"<"B"."COOKIE")
   7 - filter("B"."COOKIE">'gingerbread')

So, what are the lessons here?

First, I really like oatmeal cookies.

Second, I’m reminded that I should bake more.

Third, if you’re stuck on a query where the optimizer seems like it’s ignoring some crucial information, it may be a hole in its logic where it doesn’t extend “obvious” rules throughout the plan. If so, it may be worth changing the where conditions to include, exclude or alter functionally identical conditions.

For example:
WHERE a=b and b=c
could be written
WHERE a=b and a=c
Or
WHERE a=b and a=c and b=c

All three conditions are functionally identical; but writing them in different ways could hide or expose information the optimizer can’t derive on its own (even if we think it should.)