Close

Using Object Types: converting Text to Collections and Collections to Text

A common occurrence in sql is to aggregate text values into a larger, delimited value. It can also occur that you have a delimited value and you want treat it as a collection to pull data out of it process each of the sub-values. Oracle has a built-in function LISTAGG to aggregate text values into a larger varchar2 value. Before LISTAGG was available a user-defined aggregate might have been used. Or possibly the native function COLLECT would be used then turn the collection into a string.

Here I’ll try to list several of the popular incarnations that translate the data in either direction.

Probably the first function that gained popularity was Tom Kyte’s STRAGG which was user-defined aggregate first available in 9i and his version looked like this:

CREATE OR REPLACE TYPE string_agg_type AS OBJECT
(
    total VARCHAR2(4000),
    STATIC FUNCTION odciaggregateinitialize(sctx IN OUT string_agg_type)
        RETURN NUMBER,
    MEMBER FUNCTION odciaggregateiterate(self IN OUT string_agg_type, VALUE IN VARCHAR2)
        RETURN NUMBER,
    MEMBER FUNCTION odciaggregateterminate(self IN string_agg_type, returnvalue OUT VARCHAR2, flags IN NUMBER)
        RETURN NUMBER,
    MEMBER FUNCTION odciaggregatemerge(self IN OUT string_agg_type, ctx2 IN string_agg_type)
        RETURN NUMBER
);
/

CREATE OR REPLACE TYPE BODY string_agg_type
IS
    STATIC FUNCTION odciaggregateinitialize(sctx IN OUT string_agg_type)
        RETURN NUMBER
    IS
    BEGIN
        sctx := string_agg_type(NULL);
        RETURN odciconst.success;
    END;

    MEMBER FUNCTION odciaggregateiterate(self IN OUT string_agg_type, VALUE IN VARCHAR2)
        RETURN NUMBER
    IS
    BEGIN
        self.total := self.total || ',' || VALUE;
        RETURN odciconst.success;
    END;

    MEMBER FUNCTION odciaggregateterminate(self IN string_agg_type, returnvalue OUT VARCHAR2, flags IN NUMBER)
        RETURN NUMBER
    IS
    BEGIN
        returnvalue := LTRIM(self.total, ',');
        RETURN odciconst.success;
    END;

    MEMBER FUNCTION odciaggregatemerge(self IN OUT string_agg_type, ctx2 IN string_agg_type)
        RETURN NUMBER
    IS
    BEGIN
        self.total := self.total || ctx2.total;
        RETURN odciconst.success;
    END;
END;
/

CREATE OR REPLACE FUNCTION stragg(input VARCHAR2)
    RETURN VARCHAR2
    PARALLEL_ENABLE
    AGGREGATE USING string_agg_type;
/

The original can be found here.

stragg worked great, and still does; but it does suffer from one significant limitation, and that is the varchar2 limit of 4000 characters. The native function LISTAGG has the same problem. It can be somewhat ameliorated in 12c with the LISTAGG overflow parameters; but those don’t let you exceed the limit. They only determine what to do when an overflow happens. If you really want all of the data then you need to aggregate into a different data type.

With a few simple changes though and we can have an aggregate that concatenates varchar2 values into a clob and the resulting text can be billions of characters long.

CREATE OR REPLACE TYPE vcagg_to_clob_type AS OBJECT
(
    v_result CLOB,
    STATIC FUNCTION odciaggregateinitialize(sctx IN OUT vcagg_to_clob_type)
        RETURN NUMBER,
    MEMBER FUNCTION odciaggregateiterate(self IN OUT vcagg_to_clob_type, p_string IN VARCHAR2)
        RETURN NUMBER,
    MEMBER FUNCTION odciaggregateterminate(self IN vcagg_to_clob_type, returnvalue OUT CLOB, flags IN NUMBER)
        RETURN NUMBER,
    MEMBER FUNCTION odciaggregatemerge(self IN OUT vcagg_to_clob_type, ctx2 IN vcagg_to_clob_type)
        RETURN NUMBER
);
/

CREATE OR REPLACE TYPE BODY vcagg_to_clob_type
IS
    STATIC FUNCTION odciaggregateinitialize(sctx IN OUT vcagg_to_clob_type)
        RETURN NUMBER
    IS
    BEGIN
        sctx := vcagg_to_clob_type(NULL);
        RETURN odciconst.success;
    END;

    MEMBER FUNCTION odciaggregateiterate(self IN OUT vcagg_to_clob_type, p_string IN VARCHAR2)
        RETURN NUMBER
    IS
    BEGIN
        self.v_result := self.v_result || ',' || p_string;
        RETURN odciconst.success;
    END;

    MEMBER FUNCTION odciaggregateterminate(self IN vcagg_to_clob_type, returnvalue OUT CLOB, flags IN NUMBER)
        RETURN NUMBER
    IS
    BEGIN
        returnvalue := LTRIM(self.v_result, ',');
        RETURN odciconst.success;
    END;

    MEMBER FUNCTION odciaggregatemerge(self IN OUT vcagg_to_clob_type, ctx2 IN vcagg_to_clob_type)
        RETURN NUMBER
    IS
    BEGIN
        self.v_result := self.v_result || ctx2.v_result;
        RETURN odciconst.success;
    END;
END;
/

CREATE OR REPLACE FUNCTION clobagg(p_string VARCHAR2)
    RETURN CLOB
    DETERMINISTIC
    PARALLEL_ENABLE
    AGGREGATE USING vcagg_to_clob_type;
/

Another option, introduced into 10g is the COLLECT aggregate function. This will allow the user to aggregate multiple rows into a nested table collection type. This nested table can then be iterated through to create a varchar2 or clob type depending on the size.

First, we’ll create a nested table type to hold our collection.

CREATE TYPE vctab AS TABLE OF VARCHAR2(4000);

Then we can create functions to turn the collection into a varchar2.

CREATE OR REPLACE FUNCTION tbl2str(p_tbl IN vctab, p_delimiter IN VARCHAR2 DEFAULT ',')
    RETURN VARCHAR2
    DETERMINISTIC
IS
    v_str   VARCHAR2(32767);
BEGIN
    IF p_tbl.COUNT > 0
    THEN
        v_str  := p_tbl(1);

        FOR i IN 2 .. p_tbl.COUNT
        LOOP
            v_str  := v_str || p_delimiter || p_tbl(i);
        END LOOP;
    END IF;

    RETURN v_str;
END;

And we’ll create another one to return a clob for large result sets. I’ve added a little extra complexity for performance by using an intermediate varchar2.

CREATE OR REPLACE FUNCTION tbl2clob(p_tbl IN vctab, p_delimiter IN VARCHAR2 DEFAULT ',')
    RETURN CLOB
    DETERMINISTIC
IS
    v_str                VARCHAR2(32767);
    v_clob               CLOB;
    v_delimiter_length   PLS_INTEGER := NVL(LENGTH(p_delimiter), 0);
BEGIN
    DBMS_LOB.createtemporary(v_clob, TRUE);

    IF p_tbl.COUNT > 0
    THEN
        v_str := p_tbl(1);

        FOR i IN 2 .. p_tbl.COUNT
        LOOP
            -- Appending to clobs is slower than appending to varchar2
            -- so use varchar2 until you can't anymore then append one big chunk
            IF NVL(LENGTH(v_str), 0) + v_delimiter_length + NVL(LENGTH(p_tbl(i)), 0) <= 32767
            THEN
                v_str := v_str || p_delimiter || p_tbl(i);
            ELSE
                v_clob := v_clob || v_str || p_delimiter || p_tbl(i);
                v_str := NULL;
            END IF;
        END LOOP;
    END IF;

    IF v_str IS NOT NULL
    THEN
        v_clob := v_clob || v_str;
    END IF;

    RETURN v_clob;
END;

So a simple example comparing all of the above might look like this…

  SELECT owner,
         stragg(DISTINCT object_type) stragg_object_types,
         LISTAGG(DISTINCT object_type, ',') WITHIN GROUP (ORDER BY object_type) listagg_object_types,
         tbl2str(CAST(COLLECT(DISTINCT object_type) AS vctab)) collect_object_types,
         tbl2clob(CAST(COLLECT(object_name) AS vctab)) collect_object_names
    FROM all_objects
GROUP BY owner;

Note, the LISTAGG function doesn’t support DISTINCT until 19c. To get around that we’ll have to use a subquery to force distinct values before querying.

  SELECT owner,
         stragg(object_type) stragg_object_types,
         LISTAGG(object_type, ',') WITHIN GROUP (ORDER BY object_type) listagg_object_types,
         tbl2str(CAST(COLLECT(object_type ORDER BY object_type) AS vctab)) collect_object_types
    FROM (SELECT DISTINCT owner, object_type
            FROM all_objects)
GROUP BY owner;

Also note, the LISTAGG and COLLECT functions allow for ordering, STRAGG does not, at least not as an aggregate. If you invoke STRAGG as an analytic function then it is possible to do so.

SELECT DISTINCT
       owner,
       stragg(object_type)
           OVER(PARTITION BY owner ORDER BY object_type ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
           stragg_object_types
  FROM (SELECT DISTINCT owner, object_type
          FROM all_objects);

And last, what if you already have a delimited text field that you would like to turn into a collection?

Depending on the size of the text, we’ll either start with a varchar2 or a clob. The corresponding functions will then create an instance of a collection type.

I’ve posted a clob version earlier with the SPLIT_CLOB function. While it is perfectly legal syntax and functionally feasible to use that function as is on varchar2 values, you may want to use a varchar2 specific version for performance reasons.

CREATE OR REPLACE FUNCTION str2tbl(p_string IN VARCHAR2, p_delimiter IN VARCHAR2 := ',')
    RETURN vctab
    PIPELINED
AS
    v_length   NUMBER := LENGTH(p_string);
    v_start    NUMBER := 1;
    v_index    NUMBER;
BEGIN
    WHILE (v_start <= v_length)
    LOOP
        v_index := INSTR(p_string, p_delimiter, v_start);

        IF v_index = 0
        THEN
            PIPE ROW (SUBSTR(p_string, v_start));

            v_start := v_length + LENGTH(p_delimiter);
        ELSE
            PIPE ROW (SUBSTR(p_string, v_start, v_index - v_start));

            v_start := v_index + LENGTH(p_delimiter);
        END IF;
    END LOOP;

    RETURN;
END str2tbl;

Another option would be to put them into a package with a single overloaded name.

A simple example usage:

select * from table(str2tbl('a|b|c|d','|'));

Using these functions individually or in combination has helped me immensely in parsing documents of various sources. I hope they help you as well.

Questions and comments, as always, are welcome.