Recently I was working on a SQL performance problem and looking at the statement I found an unusual clause of the form
where NVL(date_column,sysdate) < date '2022-10-01'
Note: in the example I’ve used a date literal for the sake of illustration. The actual code used a variable; but the problem exists either way.
Checking with the users, I found the date criteria would always be for prior months. That is, this query ran in October 2022, looking for data from September and earlier. The next time this process would run would be in November, looking for October data and earlier.
Since the query always looks for data from prior months, any NULL values for the date_column will be converted to the current date/time which will always be greater than or equal to the date limit parameter. Even if the report ran at 2022-10-01 00:00:00.0 it would still exclude all NULL values by converting them to a date outside the specified upper range. Thus causing the NULLs to be excluded from the report.
So, the condition, as written, does function as intended; but it’s not necessary. If the date_column is NULL for a row, the condition for that row will be evaluated like this:
where NULL< date '2022-10-01'
The condition will not evaluate TRUE for NULL. Thus causing those rows to be excluded without needing to apply the NVL conversion
Furthermore, in this particular example, the date_column was indexed. So applying the NVL function to the column values prevented that index from being used. One might be tempted to create a function-based index on the NVL(date_column, sysdate) results but that won’t work because SYSDATE is not deterministic. You could, instead, use a static date impossibly far in the future so the condition will always return false. Something like this…
NVL(date_column, date '9999-12-31')
Then modify your queries to use the same construction in the SQL queries.
Following those steps would work, and would likely produce better performance by letting the optimizer use the new function-based index; but doing that, every row in the table will be indexed whether it is null or not. And, all those null values are values that you already know you will not want to be returned. So you are creating a new index and filling it full of values for the sole purpose of ignoring all those values.
Yes, it will work, but when stated like that, you can, hopefully, see that it’s a design that doesn’t sound quite right.
Also, remember the original date_column was already indexed. So not only can the simpler where date_column < date ‘2022-10-01’ use that index; but the null values for the column will not be included in that index. Thus, this index implicitly excludes all of the rows with NULL that we know we don’t want in our report.
In summary, adding extra “smarts” to the condition to try to handle NULLs only added complexity and a performance hit while providing no functional benefit over simply letting NULLs be processed, and excluded, natively.
Making the small change to the where clause to the form of
where date_column < date '2022-10-01'
not only improved performance by letting the index be used, the plan as a whole changed with a new driving table since the index statistics were now useful information to the optimizer.
The resulting performance benefit was about 350 times faster, again, simply by taking out the NVL() call and letting the NULLs do the work for them.
While I enjoy providing helpful tips, I’m especially happy to do so in memory of Joel Kallman in honor of all he gave to the Oracle community.
Thank you for reading, I hope you find it helpful. Questions and comments, as always, are welcome.