Using a Java function to iterate large JSON with many keys.

In my previous article I provided a few PL/SQL options for returning a collection of keys from a JSON document. They are simple to use and work well enough for most JSON sources I’ve encountered.

SQL> select * from json_keys(json('{"a":"test 1", "b": "test 2", "c" : "test 3"}'));

COLUMN_VALUE
--------------------------------------------------------------------------------
a
b
c

Iudith Mentzel pointed out a pure-SQL option as well.

SQL> SELECT SUBSTR(key_path, 3)
  2    FROM JSON_TABLE((SELECT json_dataguide('{"a":"test 1", "b": "test 2", "c" : "test 3"}') FROM DUAL),
  3                    '$[*]'
  4                    COLUMNS(key_path VARCHAR2(30) PATH '$."o:path"'))
  5   WHERE key_path != '$';

SUBSTR(KEY_PATH,3)
--------------------------------------------------------------------------------
a
b
c

While these work they both suffer from a limitation on the number of keys they can return. My PL/SQL solutions rely on the json_key_list type which is a varray, and therefor has an upper limit of 32767 elements. Even the nested table forms of the function are still subject to the limitation because the get_keys method of the json_object_t must be called first, generating the json_key_list, which limits the total number of keys that can be processed. If your json object has more keys than will fit in the varray, you will raise an exception.

The json_dataguide function has a documented limit of 5000 children for any node. Rather than generating an exception, it simply ignore any nodes greater than that.

To demonstrate, I’ll use a function that can generate a JSON with an arbitrary number of keys and try both of the methods shown above. The function generates a CLOB with a simple json object of key/value pairs. The keys are hexademical strings, the values are numeric of the same value – {“1”:1, “2”:2, …,”a”:10, “b”:11,…}

CREATE OR REPLACE FUNCTION n_key_json(n IN NUMBER)
    RETURN CLOB
IS
    v_clob    CLOB := EMPTY_CLOB;
    v_chunk   VARCHAR2(32767);
BEGIN
    v_clob := '{';

    IF n > 0
    THEN
        v_chunk := '"1":1';

        FOR i IN 2 .. n
        LOOP
            v_chunk := v_chunk || ',"' || TO_CHAR(i, 'fmxxxxxxxxxxxxxxx') || '":' || TO_CHAR(i, 'fm999999999999999');

            IF LENGTH(v_chunk) > 32000
            THEN
                v_clob := v_clob || v_chunk;
                v_chunk := NULL;
            END IF;
        END LOOP;

        v_clob := v_clob || v_chunk;
    END IF;

    v_clob := v_clob || '}';

    RETURN v_clob;
END;

Using the n_key_json function, we’ll create a json document of 35000 key/value pairs. But using any of the json_keys functions above will fail when it tries to generate the json_key_list varray.

SQL> select * from json_keys(json(n_key_json(35000)));
select * from json_keys(json(n_key_json(35000)))
              *
ERROR at line 1:
ORA-40684: maximum number of key names exceeded
ORA-06512: at "SYS.JDOM_T", line 212
ORA-06512: at "SYS.JSON_OBJECT_T", line 492
ORA-06512: at "SDS.JSON_KEYS", line 7

Using json_dataguide we can see it does begin parsing the values but stops after 5001 elements (1 for the parent plus 5000 children.)

SQL> SELECT SUBSTR(key_path, 3)
  2    FROM JSON_TABLE((SELECT json_dataguide(n_key_json(35000)) FROM DUAL),
  3                    '$[*]'
  4                    COLUMNS(key_path VARCHAR2(30) PATH '$."o:path"'));

SUBSTR(KEY_PATH,3)
--------------------------------------------------------------------------------

"1"
"2"
"3"
"4"
"5"
"6"
"7"
"8"
"9"
a
b
c
d
e
f
"10"
"11"
"12"
"13"
"14"
"15"
...
"1382"
"1383"
"1384"
"1385"
"1386"
"1387"
"1388"

5001 rows selected.

While the keys for this constructed text are not iterable, it is still valid JSON content. All of the keys are defined and accessible within the json type, but there is no sql or pl/sql api for listing all of them. In the example below I’m showing that I can pull the first, last and a few values from the middle.

SQL> SELECT json_value(big_json, '$."1".number()') a,
  2         json_value(big_json, '$."aaa".number()') b,
  3         json_value(big_json, '$."fff".number()') c,
  4         json_value(big_json, '$."88b8".number()') d
  5    FROM (SELECT json(n_key_json(35000)) big_json FROM DUAL);

         A          B          C          D
---------- ---------- ---------- ----------
         1       2730       4095      35000

So, the CLOB JSON is constructed correctly, the resulting structure is valid; but we can’t use SQL or PL/SQL to read the keys. What are the options then?

Java does not have the limits above, it can process any number of elements (subject to memory constraints to hold a large document) . Below I define a java-based function to parse the text and return a nested table collection.

CREATE OR REPLACE AND RESOLVE JAVA SOURCE NAMED "JSONparser"
as import java.io.*;
import java.sql.*;
import java.util.*;
import oracle.sql.*;
import oracle.jdbc.driver.*;
import javax.json.Json;
import javax.json.JsonReader;
import javax.json.JsonObject;

public class JSONparser {
    public static oracle.sql.ARRAY getKeys(java.sql.Clob p_json) throws Exception {
        final StringBuilder sb = new StringBuilder();
        final Reader         reader = p_json.getCharacterStream();
        final BufferedReader br     = new BufferedReader(reader);

        int b;
        while(-1 != (b = br.read()))
        {
            sb.append((char)b);
        }

        br.close();

        JsonReader jsonReader = Json.createReader(new StringReader(sb.toString()));
        JsonObject jsonObj = jsonReader.readObject();
        sb.setLength(0);
        jsonReader.close();

        ArrayList keyList = new ArrayList();

        for( String jsonkey: jsonObj.keySet()) {
            keyList.add(jsonkey);
        }

        Connection conn = new OracleDriver().defaultConnection();
        ArrayDescriptor descriptor =
            ArrayDescriptor.createDescriptor( "VCTAB", conn );

        return new ARRAY( descriptor, conn, keyList.toArray() );
    }
}

The VCTAB collection is one I’ve used in other articles:

CREATE OR REPLACE TYPE vctab AS TABLE OF VARCHAR2(4000);

To expose the java for use within sql we need to build a pl/sql interface to it. To differentiate it from the previous “json_keys” function, I’ll prefix this one with “java” to make its internal mechanism clear.

CREATE OR REPLACE FUNCTION java_json_keys(p_json IN clob)
        RETURN vctab
    AS
        LANGUAGE JAVA
        NAME 'JSONparser.getKeys( java.sql.Clob ) return oracle.sql.ARRAY' ;

The usage of this new function is similar to the purely pl/sql impementation seen above.

SQL> select * from java_json_keys(n_key_json(10));

COLUMN_VALUE
--------------------------------------------------------------------------------
1
2
3
4
5
6
7
8
9
a

10 rows selected.

And now we can use this function to return keys from very large json documents. Here I’m using the function to parse out keys from a json of 35000 elements and another of 1 million elements. Rather than list out the results here, I’ll simply count them to show the expected number of keys are parsed out of the function and it is able to retrieve all of them.

SQL> select count(*) from java_json_keys(n_key_json(35000));

  COUNT(*)
----------
     35000

SQL> select count(*) from java_json_keys(n_key_json(1000000));

  COUNT(*)
----------
   1000000

It might seem unlikely you’d need to process so many keys; and I’ll admit I haven’t need to often. But, I started down this path after running into a few JSON documents with hundreds of thousands of keyed sub-objects. It’s arguable that they would have been better constructed as arrays; but I didn’t create them, I was merely the recipient needing to ingest them.
One problem with this java implementation is it builds the entire table in memory. So, if your incoming JSON is large enough, you may run out. Another option might be to rewrite the java with the Oracle Data Cartridge Interface (ODCI) and return the collection pipelined so you don’t need to hold the entire table at once. If I run into a need to process many more large documents I may pursue that, and if I do, I’ll certainly write up in a future article.

As before, I still hope that a key iterator will be made available as a standard function within SQL syntax, and, if it comes to pass, hopefully it will be able to handle large documents with many keys. Until then though, I hope you find the option above helpful.

Thank you for reading. Questions and comments, as always, are welcome.

Thank you, but I wouldn’t put my java skills in the “mastery” category yet.
In many cases it’s likely my syntax usage is poor or at least unconventional because I intentionally try to write it somewhat like pl/sql, at least where the syntax permits,
which means I’m not taking advantage of some modern java constructs.

On the other hand, all of my use cases have narrowly defined requirements so I’ve not needed to embrace the full functionality (and added complexity) of these extra features.

I’m sure the json_dataguide “could” be implemented to return more than 5000 child nodes but it’s a documented hard-limit.
It’s not something we can code around. It might be possible to build an index that holds all of the keyed values, but that’s outside the scope of these two articles.
To borrow your phrase, these functions are for “on the fly” querying of data, where it might not be feasible, or even possible to load a table and query a view over the index.
That is an interesting exercise though, I think I’ll give it a try.

Similarly, I’m sure the json_key_list varray could be implemented to have a larger limit than 32767, but, as of 23c anyway, that’s still what it is.
So, until json_key_list is changed, 32767 is not a negotiable limit; and because it’s a varray, even if it is declared larger it will always have a limit.
To truly support an arbitrary number of keys there will need to be a method that can return a table collection.

Going with a nested table as the final return result is probably the best route and pipelining with odci should help the memory constraint considerably because only the initital json itself will need to be held. The java implementation in this article almost doubles the memory needed because it constructs the table in its entirety before returning it.

2 thoughts on “Using a Java function to iterate large JSON with many keys.”

Iudith Mentzel says:

2024-03-06 at 08:09

Hello Sean,
Your mastery of all this Java “witchery” is simply amazing 🙂
This is a knowledge section that I find it extremely difficult to digest, because the interfacing of the data types between Java and Oracle / pl/sql is so awkward, in spite of the relatively simple logic …

I would be extremely glad to see your planned example using the ODCI interface,
because there are so, so few such examples in the commonly available sources …

Regarding the size limitation issue:
As I understand from the JSON documentation, the 5000 children for a parent node
is a general data guide limitation, and not just for the “on the fly” data guide generator function json_dataguide.
It would be nice if you could bother to test this, for example by storing your big json in a table, create a json search index with a dataguide, and then try to see what result do you get in the data dictionary view USER_JSON_DATAGUIDE_FIELDS.

If you remember, many, many years ago, in 2011, we had a long discussion on PL/SQL Challenge on the issue of whether a nested table stored in the database does have the same size limit as in PL/SQL, and there was an example from Valentin Nikotin that proved that this is NOT the case.
In fact, he created a pipelined function returning a nested table of 3 billion numbers !

Considering that the data guide is always stored in a CLOB, I don’t exactly understand
why does it have such a “low limit” of 5000.
Also, the size limit of the JSON_KEY_LIST varray could have been much higher even in PL/SQL, subject only to memory usage limitations.

Cheers & Best Regards,
Iudith Mentzel

1. Sean D. Stuber says:
  
  2024-03-06 at 09:01
  
  Thank you, but I wouldn’t put my java skills in the “mastery” category yet.
  In many cases it’s likely my syntax usage is poor or at least unconventional because I intentionally try to write it somewhat like pl/sql, at least where the syntax permits,
  which means I’m not taking advantage of some modern java constructs.
  
  On the other hand, all of my use cases have narrowly defined requirements so I’ve not needed to embrace the full functionality (and added complexity) of these extra features.
  
  I’m sure the json_dataguide “could” be implemented to return more than 5000 child nodes but it’s a documented hard-limit.
  It’s not something we can code around. It might be possible to build an index that holds all of the keyed values, but that’s outside the scope of these two articles.
  To borrow your phrase, these functions are for “on the fly” querying of data, where it might not be feasible, or even possible to load a table and query a view over the index.
  That is an interesting exercise though, I think I’ll give it a try.
  
  Similarly, I’m sure the json_key_list varray could be implemented to have a larger limit than 32767, but, as of 23c anyway, that’s still what it is.
  So, until json_key_list is changed, 32767 is not a negotiable limit; and because it’s a varray, even if it is declared larger it will always have a limit.
  To truly support an arbitrary number of keys there will need to be a method that can return a table collection.
  
  Going with a nested table as the final return result is probably the best route and pipelining with odci should help the memory constraint considerably because only the initital json itself will need to be held. The java implementation in this article almost doubles the memory needed because it constructs the table in its entirety before returning it.

2 thoughts on “Using a Java function to iterate large JSON with many keys.”

Leave a ReplyCancel reply