A common occurrence in sql is to aggregate text values into a larger, delimited value. It can also occur that you have a delimited value and you want treat it as a collection to pull data out of it process each of the sub-values. Oracle has a built-in function LISTAGG to aggregate text values into a larger varchar2 value. Before LISTAGG was available a user-defined aggregate might have been used. Or possibly the native function COLLECT would be used then turn the collection into a string.
Here I’ll try to list several of the popular incarnations that translate the data in either direction.
Probably the first function that gained popularity was Tom Kyte’s STRAGG which was user-defined aggregate first available in 9i and his version looked like this:
CREATE OR REPLACE TYPE string_agg_type AS OBJECT ( total VARCHAR2(4000), STATIC FUNCTION odciaggregateinitialize(sctx IN OUT string_agg_type) RETURN NUMBER, MEMBER FUNCTION odciaggregateiterate(self IN OUT string_agg_type, VALUE IN VARCHAR2) RETURN NUMBER, MEMBER FUNCTION odciaggregateterminate(self IN string_agg_type, returnvalue OUT VARCHAR2, flags IN NUMBER) RETURN NUMBER, MEMBER FUNCTION odciaggregatemerge(self IN OUT string_agg_type, ctx2 IN string_agg_type) RETURN NUMBER ); / CREATE OR REPLACE TYPE BODY string_agg_type IS STATIC FUNCTION odciaggregateinitialize(sctx IN OUT string_agg_type) RETURN NUMBER IS BEGIN sctx := string_agg_type(NULL); RETURN odciconst.success; END; MEMBER FUNCTION odciaggregateiterate(self IN OUT string_agg_type, VALUE IN VARCHAR2) RETURN NUMBER IS BEGIN self.total := self.total || ',' || VALUE; RETURN odciconst.success; END; MEMBER FUNCTION odciaggregateterminate(self IN string_agg_type, returnvalue OUT VARCHAR2, flags IN NUMBER) RETURN NUMBER IS BEGIN returnvalue := LTRIM(self.total, ','); RETURN odciconst.success; END; MEMBER FUNCTION odciaggregatemerge(self IN OUT string_agg_type, ctx2 IN string_agg_type) RETURN NUMBER IS BEGIN self.total := self.total || ctx2.total; RETURN odciconst.success; END; END; / CREATE OR REPLACE FUNCTION stragg(input VARCHAR2) RETURN VARCHAR2 PARALLEL_ENABLE AGGREGATE USING string_agg_type; /
The original can be found here.
stragg worked great, and still does; but it does suffer from one significant limitation, and that is the varchar2 limit of 4000 characters. The native function LISTAGG has the same problem. It can be somewhat ameliorated in 12c with the LISTAGG overflow parameters; but those don’t let you exceed the limit. They only determine what to do when an overflow happens. If you really want all of the data then you need to aggregate into a different data type.
With a few simple changes though and we can have an aggregate that concatenates varchar2 values into a clob and the resulting text can be billions of characters long.
CREATE OR REPLACE TYPE vcagg_to_clob_type AS OBJECT ( v_result CLOB, STATIC FUNCTION odciaggregateinitialize(sctx IN OUT vcagg_to_clob_type) RETURN NUMBER, MEMBER FUNCTION odciaggregateiterate(self IN OUT vcagg_to_clob_type, p_string IN VARCHAR2) RETURN NUMBER, MEMBER FUNCTION odciaggregateterminate(self IN vcagg_to_clob_type, returnvalue OUT CLOB, flags IN NUMBER) RETURN NUMBER, MEMBER FUNCTION odciaggregatemerge(self IN OUT vcagg_to_clob_type, ctx2 IN vcagg_to_clob_type) RETURN NUMBER ); / CREATE OR REPLACE TYPE BODY vcagg_to_clob_type IS STATIC FUNCTION odciaggregateinitialize(sctx IN OUT vcagg_to_clob_type) RETURN NUMBER IS BEGIN sctx := vcagg_to_clob_type(NULL); RETURN odciconst.success; END; MEMBER FUNCTION odciaggregateiterate(self IN OUT vcagg_to_clob_type, p_string IN VARCHAR2) RETURN NUMBER IS BEGIN self.v_result := self.v_result || ',' || p_string; RETURN odciconst.success; END; MEMBER FUNCTION odciaggregateterminate(self IN vcagg_to_clob_type, returnvalue OUT CLOB, flags IN NUMBER) RETURN NUMBER IS BEGIN returnvalue := LTRIM(self.v_result, ','); RETURN odciconst.success; END; MEMBER FUNCTION odciaggregatemerge(self IN OUT vcagg_to_clob_type, ctx2 IN vcagg_to_clob_type) RETURN NUMBER IS BEGIN self.v_result := self.v_result || ctx2.v_result; RETURN odciconst.success; END; END; / CREATE OR REPLACE FUNCTION clobagg(p_string VARCHAR2) RETURN CLOB DETERMINISTIC PARALLEL_ENABLE AGGREGATE USING vcagg_to_clob_type; /
Another option, introduced into 10g is the COLLECT aggregate function. This will allow the user to aggregate multiple rows into a nested table collection type. This nested table can then be iterated through to create a varchar2 or clob type depending on the size.
First, we’ll create a nested table type to hold our collection.
CREATE TYPE vctab AS TABLE OF VARCHAR2(4000);
Then we can create functions to turn the collection into a varchar2.
CREATE OR REPLACE FUNCTION tbl2str(p_tbl IN vctab, p_delimiter IN VARCHAR2 DEFAULT ',') RETURN VARCHAR2 DETERMINISTIC IS v_str VARCHAR2(32767); BEGIN IF p_tbl.COUNT > 0 THEN v_str := p_tbl(1); FOR i IN 2 .. p_tbl.COUNT LOOP v_str := v_str || p_delimiter || p_tbl(i); END LOOP; END IF; RETURN v_str; END;
And we’ll create another one to return a clob for large result sets. I’ve added a little extra complexity for performance by using an intermediate varchar2.
CREATE OR REPLACE FUNCTION tbl2clob(p_tbl IN vctab, p_delimiter IN VARCHAR2 DEFAULT ',') RETURN CLOB DETERMINISTIC IS v_str VARCHAR2(32767); v_clob CLOB; v_delimiter_length PLS_INTEGER := NVL(LENGTH(p_delimiter), 0); BEGIN DBMS_LOB.createtemporary(v_clob, TRUE); IF p_tbl.COUNT > 0 THEN v_str := p_tbl(1); FOR i IN 2 .. p_tbl.COUNT LOOP -- Appending to clobs is slower than appending to varchar2 -- so use varchar2 until you can't anymore then append one big chunk IF NVL(LENGTH(v_str), 0) + v_delimiter_length + NVL(LENGTH(p_tbl(i)), 0) <= 32767 THEN v_str := v_str || p_delimiter || p_tbl(i); ELSE v_clob := v_clob || v_str || p_delimiter || p_tbl(i); v_str := NULL; END IF; END LOOP; END IF; IF v_str IS NOT NULL THEN v_clob := v_clob || v_str; END IF; RETURN v_clob; END;
So a simple example comparing all of the above might look like this…
SELECT owner, stragg(DISTINCT object_type) stragg_object_types, LISTAGG(DISTINCT object_type, ',') WITHIN GROUP (ORDER BY object_type) listagg_object_types, tbl2str(CAST(COLLECT(DISTINCT object_type) AS vctab)) collect_object_types, tbl2clob(CAST(COLLECT(object_name) AS vctab)) collect_object_names FROM all_objects GROUP BY owner;
Note, the LISTAGG function doesn’t support DISTINCT until 19c. To get around that we’ll have to use a subquery to force distinct values before querying.
SELECT owner, stragg(object_type) stragg_object_types, LISTAGG(object_type, ',') WITHIN GROUP (ORDER BY object_type) listagg_object_types, tbl2str(CAST(COLLECT(object_type ORDER BY object_type) AS vctab)) collect_object_types FROM (SELECT DISTINCT owner, object_type FROM all_objects) GROUP BY owner;
Also note, the LISTAGG and COLLECT functions allow for ordering, STRAGG does not, at least not as an aggregate. If you invoke STRAGG as an analytic function then it is possible to do so.
SELECT DISTINCT owner, stragg(object_type) OVER(PARTITION BY owner ORDER BY object_type ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) stragg_object_types FROM (SELECT DISTINCT owner, object_type FROM all_objects);
And last, what if you already have a delimited text field that you would like to turn into a collection?
Depending on the size of the text, we’ll either start with a varchar2 or a clob. The corresponding functions will then create an instance of a collection type.
I’ve posted a clob version earlier with the SPLIT_CLOB function. While it is perfectly legal syntax and functionally feasible to use that function as is on varchar2 values, you may want to use a varchar2 specific version for performance reasons.
CREATE OR REPLACE FUNCTION str2tbl(p_string IN VARCHAR2, p_delimiter IN VARCHAR2 := ',') RETURN vctab PIPELINED AS v_length NUMBER := LENGTH(p_string); v_start NUMBER := 1; v_index NUMBER; BEGIN WHILE (v_start <= v_length) LOOP v_index := INSTR(p_string, p_delimiter, v_start); IF v_index = 0 THEN PIPE ROW (SUBSTR(p_string, v_start)); v_start := v_length + LENGTH(p_delimiter); ELSE PIPE ROW (SUBSTR(p_string, v_start, v_index - v_start)); v_start := v_index + LENGTH(p_delimiter); END IF; END LOOP; RETURN; END str2tbl;
Another option would be to put them into a package with a single overloaded name.
A simple example usage:
select * from table(str2tbl('a|b|c|d','|'));
Using these functions individually or in combination has helped me immensely in parsing documents of various sources. I hope they help you as well.
Questions and comments, as always, are welcome.