Working with Table Functions in SQL

In a previous article I looked at creating and using collections within SQL. I mentioned, but did not explore functions returning Nested Table and Varying Array collection types in that article. Those fuctions, known as table functions, will be the subject here.

A table function can return either of the two collection types in their entirety or one element (row) at a time. As the name table function implies, you can query the output of such functions as if it were a table. In versions 9i, 10g, and 11g you needed the TABLE clause to query from them.

select * from TABLE(my_function)

Starting with 12c though you can query from the functions directly, but the TABLE clause is still supported.

select * from my_function

As mentioned above, the functions can return a nested table or a varray result, so we’ll define one of each for the examples below.

create type numtab is table of number;
create type numarr is varray(30) of number;

Then we’ll use the numtab type in function returning a nested table of integers. the function accepts a count (N) and returns a collection populated 1,2,3, …, N

CREATE OR REPLACE FUNCTION nt_function(n IN INTEGER)
    RETURN numtab
IS
    v_result   numtab := numtab();
BEGIN
    IF n > 0
    THEN
        v_result.EXTEND(n);

        FOR i IN 1 .. n
        LOOP
            v_result(i) := i;
        END LOOP;
    END IF;

    RETURN v_result;
END;


select * from nt_function(10);
COLUMN_VALUE
------------
           1
           2
           3
           4
           5
           6
           7
           8
           9
          10

And we can create a similar function returning a varray.

CREATE OR REPLACE FUNCTION va_function(n IN INTEGER)
    RETURN numarr 
IS
    v_result   numarr := numarr();
BEGIN
    IF n > 0
    THEN
        v_result.EXTEND(n);

        FOR i IN 1 .. n
        LOOP
            v_result(i) := i;
        END LOOP;
    END IF;

    RETURN v_result ;
END;
/

SQL> select * from va_function(10);

   COLUMN_VALUE
_______________
              1
              2
              3
              4
              5
              6
              7
              8
              9
             10

10 rows selected.

Note though, the returned varray type has a built in limit. If we attempt to return too many rows, we can exceed the varray definition.

SQL> select * from va_function(40);
Error starting at line : 1 in command -
select * from va_function(40)
Error report -
ORA-06532: Subscript outside of limit
ORA-06512: at "SDS.VA_FUNCTION", line 8

For this reason I usually use Nested Table types unless I’m positive my list of returned values can’t exeed my array limits.

For a short list, neither of these are too bad, but what if we wanted a list of a million rows? That’s a lot of memory consumed to build a collection holding all of those values. This is where pipelined functions come in. Most of the function syntax is the same but instead of building up the collection, you instead PIPE the results one row at a time back to the caller.

Another special feature of pipelined functions is the RETURN statement has no value of its own. Instead, the PIPE ROW statements generate the results and RETURN simply indicates when to end the function’s processing.

CREATE OR REPLACE FUNCTION nt_function_pipelined(n IN INTEGER)
    RETURN numtab
    PIPELINED
IS
BEGIN
    FOR i IN 1 .. n
    LOOP
        PIPE ROW (i);
    END LOOP;

    RETURN;
END;

SELECT * FROM nt_function_pipelined(10);

COLUMN_VALUE
------------
           1
           2
           3
           4
           5
           6
           7
           8
           9
          10

These simple numeric examples illustrate the basic syntax; but you can extend the functionality into more complex results by declaring object types of multiple values. In the next code examples I’ll build an object to hold information about an earthquake event and a collection type to hold the individual events.

CREATE OR REPLACE TYPE usgs_quake_event
AS OBJECT
(
    event_time TIMESTAMP WITH TIME ZONE,
    event_type VARCHAR2(20),
    magnitude NUMBER,
    location_name VARCHAR2(100),
    latitude NUMBER,
    longitude NUMBER,
    depth_in_km NUMBER
);

CREATE OR REPLACE TYPE usgs_quake_event_tab 
AS TABLE OF usgs_quake_event;

Now that I have an object and collection type I’ll use this function to pull from the U.S. Geological Survey’s public json data. Extract the fields to fill the objects above and pipe them out as a nested table collection.

CREATE OR REPLACE FUNCTION get_earthquakes(
    p_start_time   IN TIMESTAMP WITH TIME ZONE DEFAULT   SYSTIMESTAMP
                                                       - INTERVAL '30' DAY,
    p_end_time     IN TIMESTAMP WITH TIME ZONE DEFAULT SYSTIMESTAMP,
    p_time_zone    IN VARCHAR2 DEFAULT 'US/Eastern')
    RETURN usgs_quake_event_tab
    PIPELINED
IS
    v_request    UTL_HTTP.req;
    v_response   UTL_HTTP.resp;
    v_text   VARCHAR2(32767);
    v_clob   CLOB := EMPTY_CLOB();
BEGIN
    -- wallet is autologin, no password required
    UTL_HTTP.set_wallet('file:/home/oracle/mywallets/usgs');

    -- Read US Geological Survey records for the specified time period
    v_request :=
        UTL_HTTP.begin_request(
               'http://earthquake.usgs.gov/fdsnws/event/1/query.geojson?&starttime='
            || TO_CHAR(p_start_time AT TIME ZONE 'UTC',
                       'yyyy-mm-dd"%20"hh24:mi:ss')
            || '&endtime='
            || TO_CHAR(p_end_time AT TIME ZONE 'UTC',
                       'yyyy-mm-dd"%20"hh24:mi:ss')
            || '&minmagnitude=1');
    v_response := UTL_HTTP.get_response(v_request);

    BEGIN
        LOOP
            UTL_HTTP.read_text(v_response, v_text, 32000);
            v_clob := v_clob || v_text;
        END LOOP;
    EXCEPTION
        WHEN UTL_HTTP.end_of_body
        THEN
            -- We expect to hit this.  Cleanup and continue.
            UTL_HTTP.end_response(v_response);
        WHEN OTHERS
        THEN
            -- Some other error happened, cleanup and reraise it.
            UTL_HTTP.end_response(v_response);
            RAISE;
    END;

    -- Parse out the most interesting fields.
    --  Outline of json structure
    --  { "features" [ 
    --       { "properties" : {...}, "geometry" : "coordinates":[...]},
    --       { "properties" : {...}, "geometry" : "coordinates":[...]},
    --       { "properties" : {...}, "geometry" : "coordinates":[...]}
    --    ]
    --  }
    FOR x
        IN (SELECT FROM_TZ(
                         TO_TIMESTAMP('1970-01-01 00:00:00',
                                      'yyyy-mm-dd hh24:mi:ss')
                       + NUMTODSINTERVAL(event_time / 1000, 'second'),
                       'UTC')
                       AT TIME ZONE (p_time_zone) event_time,
                   event_type,
                   magnitude,
                   location_name,
                   latitude,
                   longitude,
                   depth_in_km
              FROM JSON_TABLE(
                       json(v_clob),
                       '$.features[*]'
                       COLUMNS(
                           event_time NUMBER PATH '$.properties.time',
                           event_type VARCHAR2(20) PATH '$.properties.type',
                           magnitude NUMBER PATH '$.properties.mag',
                           location_name
                               VARCHAR2(100)
                               PATH '$.properties.place',
                           longitude NUMBER PATH '$.geometry.coordinates[0]',
                           latitude NUMBER PATH '$.geometry.coordinates[1]',
                           depth_in_km NUMBER PATH '$.geometry.coordinates[2]')))
    LOOP
        -- Construct an object from the fields and pipe it out
        PIPE ROW (usgs_quake_event(x.event_time,
                                   x.event_type,
                                   x.magnitude,
                                   x.location_name,
                                   x.latitude,
                                   x.longitude,
                                   x.depth_in_km));
    END LOOP;

    RETURN;
END;

select * from get_earthquakes()
order by magnitude desc,event_time;

EVENT_TIME EVENT_TYPE     MAGNITUDE LOCATION_NAME                        LATITUDE  LONGITUDE DEPTH_IN_KM
---------- ------------- ---------- ---------------------------------- ---------- ---------- -----------
2023-06-15 earthquake           7.2 274 km SW of Houma, Tonga             -22.976    -177.13     175.089
2023-07-02 earthquake           6.9                                      -17.8785  -174.9429         229
2023-06-18 earthquake           6.4 108 km ESE of La Rivera, Mexico       23.2012  -108.6126          10
2023-06-11 earthquake           6.2 20 km WSW of Biratori, Japan          42.5268   141.9005         121
2023-06-14 earthquake           6.2 11 km SSE of Hukay, Philippines       13.7495   120.7447         112

Here I’m only showing the first 5 rows returned, i.e. the largest events within the past 30 days. At the time I ran this there were several thousand events of magnitude 1 or higher. Note the blank location isn’t an error, it’s in an unnamed area in the middle of the ocean between Fiji, Tonga, and Samoa.

This example helps illustrate some of the thought processes in building a table function. While the pipelined function pipes rows out one-at-a-time, the source json data is first retrieved in its entirety. If you intend to read all of the results every time then this may be fine; but if the invoker might process only a subset of the results it might make sense to see if the REST call can accept other parameters. In this case the USGS service does permit additional parameters to limit the result set and order the results by time or magnitude.

With this in mind we might want to alter the function to only return the top 10 results, or instead only return events that meet or exceed a certain magnitude. There are few ways we could implement these requirements. We could query everything and then filter within our function before returning, or, as mentioned above, we can request the service to only send what we need, which will be more efficient. Also “top 10”, does that mean the largest? Or does that mean the most recent? Or does that mean something else? For my purposes, it will mean the largest magnitude earthquakes. Again, that could be determined within my function, but doing so would require reading all the data, sorting it, and then throwing away those I don’t want. Certainly possible, but slower and less efficient since the USGS service will do some of that sorting and filtering for us before sending the data out over the internet. If our source service didn’t provide these functions then we’d have to do it locally; but in in this case I’ll take advantage of what the USGS provides.
The new function might look something like this…

CREATE OR REPLACE FUNCTION get_earthquakes(
    p_start_time      IN TIMESTAMP WITH TIME ZONE DEFAULT   SYSTIMESTAMP
                                                          - INTERVAL '30' DAY,
    p_end_time        IN TIMESTAMP WITH TIME ZONE DEFAULT SYSTIMESTAMP,
    p_time_zone       IN VARCHAR2 DEFAULT 'US/Eastern',
    p_event_limit     IN INTEGER DEFAULT NULL,
    p_min_magnitude   IN NUMBER DEFAULT 1)
    RETURN usgs_quake_event_tab
    PIPELINED
IS
    v_request         UTL_HTTP.req;
    v_response        UTL_HTTP.resp;
    v_text            VARCHAR2(32767);
    v_clob            CLOB := EMPTY_CLOB();
    v_min_magnitude   NUMBER := GREATEST(1, NVL(p_min_magnitude, 1));
BEGIN
    -- wallet is autologin, no password required
    UTL_HTTP.set_wallet('file:/home/oracle/mywallets/usgs');

    -- Read US Geological Survey records for the specified time period
    v_request :=
        UTL_HTTP.begin_request(
               'http://earthquake.usgs.gov/fdsnws/event/1/query.geojson?&starttime='
            || TO_CHAR(p_start_time AT TIME ZONE 'UTC',
                       'yyyy-mm-dd"%20"hh24:mi:ss')
            || '&endtime='
            || TO_CHAR(p_end_time AT TIME ZONE 'UTC',
                       'yyyy-mm-dd"%20"hh24:mi:ss')
            || '&minmagnitude='
            || TO_CHAR(v_min_magnitude, 'fm99.9')
            || '&orderby=magnitude'
            || CASE
                   WHEN p_event_limit IS NOT NULL
                   THEN
                       '&limit=' || TO_CHAR(p_event_limit, 'fm99999')
               END);
    v_response := UTL_HTTP.get_response(v_request);

    BEGIN
        LOOP
            UTL_HTTP.read_text(v_response, v_text, 32000);
            v_clob := v_clob || v_text;
        END LOOP;
    EXCEPTION
        WHEN UTL_HTTP.end_of_body
        THEN
            -- We expect to hit this.  Cleanup and continue.
            UTL_HTTP.end_response(v_response);
        WHEN OTHERS
        THEN
            -- Some other error happened, cleanup and reraise it.
            UTL_HTTP.end_response(v_response);
            RAISE;
    END;

    -- Parse out the most interesting fields.
    --  Outline of json structure
    --  { "features" [
    --       { "properties" : {...}, "geometry" : "coordinates":[...]},
    --       { "properties" : {...}, "geometry" : "coordinates":[...]},
    --       { "properties" : {...}, "geometry" : "coordinates":[...]}
    --    ]
    --  }
    FOR x
        IN (SELECT FROM_TZ(
                         TO_TIMESTAMP('1970-01-01 00:00:00',
                                      'yyyy-mm-dd hh24:mi:ss')
                       + NUMTODSINTERVAL(event_time / 1000, 'second'),
                       'UTC')
                       AT TIME ZONE (p_time_zone) event_time,
                   event_type,
                   magnitude,
                   location_name,
                   latitude,
                   longitude,
                   depth_in_km
              FROM JSON_TABLE(
                       json(v_clob),
                       '$.features[*]'
                       COLUMNS(
                           event_time NUMBER PATH '$.properties.time',
                           event_type VARCHAR2(20) PATH '$.properties.type',
                           magnitude NUMBER PATH '$.properties.mag',
                           location_name
                               VARCHAR2(100)
                               PATH '$.properties.place',
                           longitude NUMBER PATH '$.geometry.coordinates[0]',
                           latitude NUMBER PATH '$.geometry.coordinates[1]',
                           depth_in_km NUMBER PATH '$.geometry.coordinates[2]')))
    LOOP
        -- Construct an object from the fields and pipe it out
        PIPE ROW (usgs_quake_event(x.event_time,
                                   x.event_type,
                                   x.magnitude,
                                   x.location_name,
                                   x.latitude,
                                   x.longitude,
                                   x.depth_in_km));
    END LOOP;

    RETURN;
END;

Then using it to return the top 10 event with a magnitude of 5 or higher…

SQL>   SELECT *
  2      FROM get_earthquakes(p_event_limit => 10,
  3                           p_min_magnitude => 5)
  4  ORDER BY magnitude DESC, event_time;

                                   EVENT_TIME    EVENT_TYPE    MAGNITUDE                             LOCATION_NAME    LATITUDE    LONGITUDE    DEPTH_IN_KM
_____________________________________________ _____________ ____________ _________________________________________ ___________ ____________ ______________
15-JUN-23 02.06.28.174000000 PM US/EASTERN    earthquake             7.2 274 km SW of Houma, Tonga                     -22.976      -177.13        175.089
02-JUL-23 06.27.43.750000000 AM US/EASTERN    earthquake             6.9                                              -17.8785    -174.9429            229
18-JUN-23 04.30.22.468000000 PM US/EASTERN    earthquake             6.4 108 km ESE of La Rivera, Mexico               23.2012    -108.6126             10
11-JUN-23 05.54.44.886000000 AM US/EASTERN    earthquake             6.2 20 km WSW of Biratori, Japan                  42.5268     141.9005            121
14-JUN-23 10.19.23.281000000 PM US/EASTERN    earthquake             6.2 11 km SSE of Hukay, Philippines               13.7495     120.7447            112
16-JUN-23 03.10.50.520000000 PM US/EASTERN    earthquake             6.2 243 km SSW of Γ??Ohonua, Tonga                -23.478    -175.5092             16
19-JUN-23 07.18.11.960000000 AM US/EASTERN    earthquake             6.2 96 km ESE of Angoram, Papua New Guinea        -4.4729     144.8345         23.844
17-JUN-23 07.26.21.874000000 AM US/EASTERN    earthquake               6 237 km SSW of Γ??Ohonua, Tonga               -23.4419    -175.4278             35
18-JUN-23 05.59.17.405000000 PM US/EASTERN    earthquake               6 south of Africa                              -48.6704      31.1865             10
25-JUN-23 03.16.59.490000000 AM US/EASTERN    earthquake               6 south of Tonga                               -24.0318    -175.6262          7.087

10 rows selected.

We could extend this functionality further, accepting Oracle Spatial types to describe rectangular or circular areas and filter the results based on events within the described boundaries. Alternately, provide input parameters for the corners of a rectangle, or the center and radius of a circle. Other options include putting the function in a package with overloaded input parameters to support any or all of these combinations.
Those permutations are beyond the intended scope of this article as they don’t really expand on the table functions and pipeline functionality.

As mentioned above, I usually use Nested Table types as my return values for table functions that might return an unknown number of values. However, it is possible to use varray return type but if you pipeline the results, the upper limit can be ignored. Here I’ll recreate the function above but return the NUMARR varray type which is defined to have an upper limit of 30. The function is pipelined though and thus doesn’t get the ORA-6532 error seen before.

CREATE OR REPLACE FUNCTION va_function_pipelined(n IN INTEGER)
    RETURN numarr pipelined
IS
BEGIN
    FOR i IN 1 .. n
    LOOP
        PIPE ROW (i);
    END LOOP;

    RETURN;
END;
/

SQL> select * from va_function_pipelined(40);

   COLUMN_VALUE
_______________
              1
              2
              3
              4
              5
              6
              7
              8
              9
             10
             11
             12
             13
             14
             15
             16
             17
             18
             19
             20
             21
             22
             23
             24
             25
             26
             27
             28
             29
             30
             31
             32
             33
             34
             35
             36
             37
             38
             39
             40

40 rows selected.

While this usage is legal syntax, I don’t recommend it since the return type implies a constrained result set that the function doesn’t obey. So, again, if you need to return an indeterminate number of rows, I suggest using Nested Table types instead of varrays.

Hopefully this introduction to table functions, with and without pipeline is helpful. Questions and comments, as always, are welcome.