Close

Choosing OpenAI model with Oracle 23ai SELECT

When using the new SELECT AI syntax of 23ai, you can specify the model you’d like to use with OpenAI’s ChatGPT service.

Before using the functionality, you must first create an ACE (access control entry) for the OpenAI host with your user as the principal and grant execute to the cloud packages DBMS_CLOUD and DBMS_CLOUD_AI. These need to be done with an ADMIN account in your cloud database.

BEGIN
    DBMS_NETWORK_ACL_ADMIN.append_host_ace(
        HOST   => 'api.openai.com',
        ace    =>
            xs$ace_type(privilege_list   => xs$name_list('http'),
                        principal_name   => 'SDS',
                        principal_type   => xs_acl.ptype_db));
END;

GRANT EXECUTE ON dbms_cloud_ai TO sds;
GRANT EXECUTE ON dbms_cloud TO sds; 

Once you have the privileges needed, you create a credential with the API key you received from OpenAI.

BEGIN
    dbms_cloud.create_credential('OPENAI_CRED', 'OPENAI', 'sk-test-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx');
END;

Then, using that credential, you can create a profile to use that credential.
If you want to use ChatGPT to try answer questions about your data, you can choose the objects you’d like to expose. The data within the tables is not uploaded, but the metadata for the specified objects will be (including the table names, column names, comments, etc.) Below I’m using the sample SH schema.

This is an extension of the examples found in the Oracle documentation. Note in that documentation the model is not specified. So you will pick up a default.
But, if you want to specify which OpenAI model to interact with, you can choose by including it in the JSON attributes parameter. Here I specify to use the “GPT 4” model.

GPT 4

BEGIN
    --dbms_cloud_ai.drop_profile('OPENAI');
    dbms_cloud_ai.create_profile('OPENAI', '{
        "provider": "openai",
        "model": "gpt-4",
        "credential_name": "OPENAI_CRED",
        "object_list": [{"owner": "SH", "name": "customers"},
                        {"owner": "SH", "name": "countries"},
                        {"owner": "SH", "name": "supplementary_demographics"},
                        {"owner": "SH", "name": "profits"},
                        {"owner": "SH", "name": "promotions"},
                        {"owner": "SH", "name": "products"}],
        "conversation": "true"
       }');
END;

After creating the profile, you use DBMS_CLOUD_AI.SET_PROFILE to make it active for your session.

BEGIN
   DBMS_CLOUD_AI.SET_PROFILE('OPENAI');
END;

And now you can use SELECT AI to answer questions about your data. Again, the data itself was uploaded in order to find the answer of 55500. Rather, ChatGPT generated SQL corresponding to the metadata provided and then executed that sql on your behalf, thus generating the SELECT COUNT(*) answer.

SQL> select ai how many customers exist;

Total Customers
---------------
          55500

Note though – even though I chose to expose some tables from the SH schema for my queries, I am not limited to using just those tables. I can query other information for example, which Olympian has won the the most medals prior to the 2024 summer games? My SH objects provides no means of answering that question. ChatGPT recognizes that fact and returns an answer saying it can’t create a SELECT statement but using its own internet search was able to find an answer to the question anyway.

SQL> select ai who has the most olympic medals going in to this year''s olympics;

RESPONSE
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sorry, unfortunately a valid SELECT statement could not be generated for your natural language prompt. Here is some more information to help you further:

As of the 2021 Tokyo Olympics, the athlete with the most Olympic medals is American swimmer Michael Phelps, with a total of 28 medals.

GPT 3.5 turbo

In the next example, I’ll change the profile to use the GPT 3.5 Turbo model. I’ll drop the previous profile and recreate it with the same credential and objects, but I’ll specify a different model in the JSON attributes.

Here we can see it was still able to answer our simple SQL count question. But when asked about the Olympics, it made the incorrect decision to try to answer the question using the COUNTRIES table above. While it is wrong, it’s not an unreasonable attempt. “Who” could mean national teams, not just individual competitors; and, even though the SH countries table doesn’t relate to Olympic medals, it is the only list of countries available, so it did what it could and listed them all. So, it’s something that could be refined, possibly by rephrasing the question – or by including Olympic medal metadata in the exposed object list.

SQL> BEGIN
  2      dbms_cloud_ai.drop_profile('OPENAI');
  3      dbms_cloud_ai.create_profile('OPENAI', '{
  4          "provider": "openai",
  5          "model": "gpt-3.5-turbo",
  6          "credential_name": "OPENAI_CRED",
  7          "object_list": [{"owner": "SH", "name": "customers"},
  8                          {"owner": "SH", "name": "countries"},
  9                          {"owner": "SH", "name": "supplementary_demographics"},
 10                          {"owner": "SH", "name": "profits"},
 11                          {"owner": "SH", "name": "promotions"},
 12                          {"owner": "SH", "name": "products"}],
 13          "conversation": "true"
 14         }');
 15  END;
 16  /

PL/SQL procedure successfully completed.

SQL> select ai how many customers exist;

CUSTOMER_COUNT
--------------
         55500

SQL> select ai who has the most olympic medals going in to this year''s olympics;

COUNTRY_NAME
----------------------------------------
China
India
Japan
Malaysia
Singapore
South Africa
Australia
New Zealand
Saudi Arabia
Poland
Germany

COUNTRY_NAME
----------------------------------------
Denmark
Spain
France
United Kingdom
Ireland
Italy
The Netherlands
Turkey
Canada
United States of America
Argentina

COUNTRY_NAME
----------------------------------------
Brazil

23 rows selected.

GPT 4o

Next, I’ll drop and recreate the profile again using the GPT 4o model. It again, is able to answer my customer question given the object list presented. The Olympics question though, like the first GPT 4 model previously, the 4o model recognizes that it can’t find the answer using the object list provided. But, this time, instead of searching the internet to try to find the answer, the 4o model generated a potential SQL query that might answer the question, if a table of Olympic athlete data were to exist.

While it was nice that GPT 4 model was able to find Michael Phelps – there is not citation of source, so while it is the correct answer as of this writing, there isn’t any particular reason to believe it. The 4o model, while not answering the question directly, does provide a method that can be explored and expanded on to provide an answer definitively – provided you have the data. For this simple example, it’s probably more helpful to just have the Phelps answer, but in a business context it might be better to have a methodology presented that can be validated independently rather than just blindly trusting a random internet search.

SQL> BEGIN
  2      dbms_cloud_ai.drop_profile('OPENAI');
  3      dbms_cloud_ai.create_profile('OPENAI', '{
  4          "provider": "openai",
  5          "model": "gpt-4o",
  6          "credential_name": "OPENAI_CRED",
  7          "object_list": [{"owner": "SH", "name": "customers"},
  8                          {"owner": "SH", "name": "countries"},
  9                          {"owner": "SH", "name": "supplementary_demographics"},
 10                          {"owner": "SH", "name": "profits"},
 11                          {"owner": "SH", "name": "promotions"},
 12                          {"owner": "SH", "name": "products"}],
 13          "conversation": "true"
 14         }');
 15  END;
 16  /

PL/SQL procedure successfully completed.

SQL> select ai how many customers exist;

Customer_Count
--------------
         55500

SQL> select ai who has the most olympic medals going in to this year''s olympics;

RESPONSE
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sorry, unfortunately a valid SELECT statement could not be generated for your natural language prompt. Here is some more information to help you further:

```sql
SELECT "A"."Athlete_Name", MAX("A"."Medal_Count") AS "Max_Medals"
FROM "Olympics"."Athletes" "A"
GROUP BY "A"."Athlete_Name"
ORDER BY "Max_Medals" DESC
FETCH FIRST 1 ROWS ONLY

GPT 4o mini

Last, I’ll try out the 4o mini model that was just released a couple days ago (as of this writing.)

Once again, it was able to correctly answer my customer question using the provided object list. However, when asked about the Olympics, it appears to have reused the previous query rather than generate something new. Unlike the 3.5 turbo model, which at least tried to find a list of countries, the 4o mini ran a completely unrelated sql.

SQL> BEGIN
  2      dbms_cloud_ai.drop_profile('OPENAI');
  3      dbms_cloud_ai.create_profile('OPENAI', '{
  4            "provider": "openai",
  5            "model": "gpt-4o-mini",
  6            "credential_name": "OPENAI_CRED",
  7            "object_list": [{"owner": "SH", "name": "customers"},
  8                            {"owner": "SH", "name": "countries"},
  9                            {"owner": "SH", "name": "supplementary_demographics"},
 10                            {"owner": "SH", "name": "profits"},
 11                            {"owner": "SH", "name": "promotions"},
 12                            {"owner": "SH", "name": "products"}],
 13            "conversation": "true"
 14           }');
 15  END;
 16  /

PL/SQL procedure successfully completed.

SQL> BEGIN
  2     DBMS_CLOUD_AI.SET_PROFILE('OPENAI');
  3  END;
  4  /

PL/SQL procedure successfully completed.

SQL> select ai how many customers exist;

Total_Customers
---------------
          55500

SQL> select ai who has the most olympic medals going in to this year''s olympics;

Total_Customers
---------------
          55500

Based on this last result, one might be tempted to simply throw out the 4o mini model as horribly erroneous; but it is new and will likely get better as time goes on. I think it’s safe to say the results for this particular example are clearly worse; but it’s still an interesting tool to explore and I look forward to seeing what else may become available.

It will also be interesting to see of model selection will be an option for the other Generative AI options, like Cohere. Azure Open AI has a similar feature. You don’t specify the model in the profile of your Oracle session, but you do define the Azure deployment, which then has a specific Open AI model associated with it.

With the rapid advancements in Open AI, we can likely expect many more models to become available. Being able to choose which to use with our profiles will be an important feature in using the SELECT AI functionality.

I’d like to thank Martin Bach for initially pointing out this feature to me.

Thank you for reading. Questions and comments, as always, are welcome.

Leave a Reply