Today I received a request that had me using the COLLECTION function to aggregate values for different categories. However, it didn’t take too long to find extra rows creeping into my results.
To my surprise I found using DISTINCT within a COLLECT works, unless you also try to order the results with an ORDER BY clause. If the ordering is included, the uniqueness criteria is negated.
I tested it on 11.2.0.4, 12.1.0.2, 12.2.0.1, and 18.3.0.0 databases. All reproduced the same problem. Here is the smallest example I could think of, simply collecting two identical values (the number 1) and the resulting collection is not distinct even though DISTINCT is specified in the function call.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters and Automatic Storage Management options SQL> select collect(distinct n order by n) x from (select 1 n from dual union all select 1 n from dual); X -------------------------------------- SYSTPgubs6QoXB17gUwowH60Wkw==(1, 1) 1 row selected.
Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics and Real Application Testing options SQL> select collect(distinct n order by n) x from (select 1 n from dual union all select 1 n from dual); X -------------------------------------- SYSTPgubmYoi2AY7gUwpcDSE2+A==(1, 1) 1 row selected.
Connected to: Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production SQL> select collect(distinct n order by n) x from (select 1 n from dual union all select 1 n from dual); X -------------------------------------- ST00001uZWKJPoArbgUwowAbgM2A=(1, 1) 1 row selected.
Connected to: Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production Version 18.3.0.0.0 SQL> select collect(distinct n order by n) x from (select 1 n from dual union all select 1 n from dual); X -------------------------------------- ST00001uadYnwXOo3gU0tqWArDUg=(1, 1) 1 row selected.
I also tested it on Oracle’s own livesql.oracle.com site. Live SQL doesn’t support returning an undefined collection type. So in the example I cast it to a defined public collection and then select from that using the TABLE function. My test script can be found here.
As a work around I can either use a subquery to find distinct values first, then wrap that in an outer query to collect the values into an ordered collection.
SELECT COLLECT(n ORDER BY n)
FROM (SELECT DISTINCT n
FROM (SELECT 1 n FROM DUAL
UNION ALL
SELECT 2 n FROM DUAL
UNION ALL
SELECT 2 n FROM DUAL
UNION ALL
SELECT 1 n FROM DUAL));
Another option, I can use collect twice, once with distinct and then again with order by. This is essentially the same as the first work around, but more complicated syntax.
SELECT COLLECT(COLUMN_VALUE ORDER BY COLUMN_VALUE)
FROM TABLE(SELECT COLLECT(DISTINCT n)
FROM (SELECT 1 n FROM DUAL
UNION ALL
SELECT 2 n FROM DUAL
UNION ALL
SELECT 2 n FROM DUAL
UNION ALL
SELECT 1 n FROM DUAL));
Using the secondary COLLECT can be useful if you are using other aggregates that apply at the lower level.
SELECT object_type,
count_all,
count_distinct,
(SELECT COLLECT(COLUMN_VALUE ORDER BY COLUMN_VALUE) FROM TABLE(owners))
FROM ( SELECT object_type,
COUNT(*) count_all,
COUNT(DISTINCT owner) count_distinct,
COLLECT(DISTINCT owner) owners
FROM all_objects
GROUP BY object_type);
I have opened an SR with Oracle Support to investigate the issue. I hope these workaround help until a fix is available.
Hello Sean,
Your post reminded me of this other post on Sayan’s lovely blog (see section ‘Fact#3’ where he also mentioned this behavior: ,”Bug 8912282: COLLECT+UNIQUE+ORDER DOES NOT REMOVE DUPLICATES”).
http://orasql.org/2012/04/28/a-funny-fact-about-collect/
Thank you for the link, I hadn’t found prior notices of the bug. Looks like it’s been around for a while and known.
In addition to the bugs noted in your link, my SR was turned into a new bug, Bug 29430277.
Hopefully Oracle will fix it this time.