Introduction to ASH Data, part 4

Up to this point in my ASH series all of the output was simply rows and columns of data, sometimes augmented with an ascii, text based chart. This time I’ll explore a few different tools to construct better visualizations.

One of the first things I ever used “R” for was to create ASH charts as an alternate to those found in Oracle’s Enterprise Manager “Top Activity” and “ASH Analytics” pages. I had multiple reasons to pursue the exercise. The first being the Top Activity page didn’t have a mechanism to specify an arbitrary time range. The ASH Analytics page does allow inputs but on refreshes it often loses them. Furthermore, both pages generated their graphs from averages over periods, which can distort the data, causing peaks and valleys to disappear. I wrote about the missing peaks a couple years ago.

So, the first visualization I created was a simple stacked bar chart with the ggplot2 package, accessing the database through the ROracle package. This script plots a half-hour range seen in previous articles but every sample time is represented. While this is still subject to the sampling issues of all ASH queries, this chart eliminates missing peaks caused by rounded averages over periods.

library("DBI")
library("ROracle")
library("ggplot2")

drv <- dbDriver("Oracle")

### Connect via credentials stored in wallet
con <- dbConnect(drv, dbname ="testdb")

######################################################################################################
#### Function: get_top_activity
######################################################################################################
get_top_activity <- function(p_con, p_start, p_end) {
    v_sql <- "
        SELECT sample_time, activity, COUNT(*) cnt
          FROM (SELECT sample_time,
                       CASE
                       WHEN session_state = 'ON CPU' THEN
                          'ON CPU'
                       WHEN wait_class IN ('Concurrency',
                                           'System I/O',
                                           'User I/O',
                                           'Configuration',
                                           'Application',
                                           'Commit',
                                           'Queueing',
                                           'Network',
                                           'Administrative',
                                           'Cluster',
                                           'Scheduler') THEN
                          wait_class
                       ELSE
                          'Other'
                       END activity
                  FROM dba_hist_active_sess_history
                 WHERE sample_time >= TO_TIMESTAMP(:start_time,'yyyy-mm-dd hh24:mi:ss')
                   AND sample_time < TO_TIMESTAMP(:end_time,'yyyy-mm-dd hh24:mi:ss'))
          GROUP BY sample_time, activity"

    ### Need to tell R the timezone of the incoming data
    ### Alternately, could use FROM_TZ function to give the timestamp values a timezone in the query.
    Sys.setenv(ORA_SDTZ = "US/Eastern")
    
    return(dbGetQuery(
              p_con,
              v_sql,
              data.frame(
                  start_time = p_start,
                  end_time = p_end
              )
           )
    )
}
######################################################################################################


##############################################################################################
###  Get ASH Data
##############################################################################################
df_activity <- get_top_activity(con, '2018-09-15 13:00:00', '2018-09-15 13:30:00')

### We don't need the db connection anymore
dbDisconnect(con)
dbUnloadDriver(drv)

### Stack the wait classes in this order (CPU on bottom, Other on top)
wait_class_order <- c(
  "Other",
  "Cluster",
  "Queueing",
  "Network",
  "Administrative",
  "Configuration",
  "Commit",
  "Application",
  "Concurrency",
  "System I/O",
  "User I/O",
  "Scheduler",
  "ON CPU"
)
df_activity$ACTIVITY <- factor(df_activity$ACTIVITY, levels = wait_class_order)


## Color hex codes for each activity type shown in the chart
wait_class_colors <- c(
  "ON CPU"         = "#04CE04",
  "Scheduler"      = "#CCFECC",
  "User I/O"       = "#084AD4",
  "System I/O"     = "#0499F4",
  "Concurrency"    = "#8E1B04",
  "Application"    = "#C72D04",
  "Commit"         = "#E46A04",
  "Configuration"  = "#5B460D",
  "Administrative" = "#707251",
  "Network"        = "#9C9376",
  "Queueing"       = "#C5B79A",
  "Cluster"        = "#CCC3AE",
  "Other"          = "#F56CA9"
)

### Build chart, fill in missing values, add titles and labels
plot_activity <- ggplot(df_activity, aes(x = SAMPLE_TIME, y = CNT, fill = ACTIVITY)) +
                        geom_bar(stat = "identity") +
                        scale_fill_manual(values = wait_class_colors) +
                        ggtitle("Active Sessions")  +
                        labs(x = "Sample Time", y =  "Active Sessions")

### Show the chart
print(plot_activity)

Running this script produced the chart below from my test system.

Shortly after I started exploring R, I was asked to help with a project using Googlecharts. I hadn’t done a lot with the Googlecharts api before; but having just completed some simple R charting, I started by replicating the ASH charts as a learning exercise. With R, you build a data frame to pass into the ggplot api. With Googlecharts, you build an array of arrays, each sub-array has a place holder value for every possible activity type, whereas the R plot function could fill in missing values as needed. Rows in the array will look something like these:

 [new Date(2018,09,15,13,06,00),6,0,0,0,0,0,0,0,0,0,0,0,0]
,[new Date(2018,09,15,13,29,33),0,0,3,0,0,0,0,0,0,0,0,0,0]
,[new Date(2018,09,15,13,19,42),0,0,3,0,0,0,0,0,0,0,0,0,0]
,[new Date(2018,09,15,13,15,51),6,0,0,0,0,0,0,0,0,0,0,0,0]

Using the same query and date range as in the R example above, adding a pivot clause, and a little pl/sql we can spool out an html file containing all of the necessary data and invoke a browser to load it and generate the chart.

set echo off
set pages 1000
set lines 1000
set feedback off
set define off
set serveroutput on
spool ash.html
DECLARE
      p_start_time TIMESTAMP := TO_TIMESTAMP('2018-09-15 13:00:00','yyyy-mm-dd hh24:mi:ss');
      p_end_time   TIMESTAMP := TO_TIMESTAMP('2018-09-15 13:30:00','yyyy-mm-dd hh24:mi:ss');
BEGIN
    DBMS_OUTPUT.put_line(
        q'[<html>
  <head>
    <script type="text/javascript" src="https://www.gstatic.com/charts/loader.js"></script>
    <script type="text/javascript">
       google.charts.load('current', {'packages':['corechart']});
       google.charts.setOnLoadCallback(drawChart);
       function drawChart() {
           var data = google.visualization.arrayToDataTable([
            ['Activity',
            'ON CPU','Scheduler','User IO', 'System IO', 'Concurrency','Application','Commit',
            'Configuration',    'Administrative',    'Network',  'Queueing', 'Cluster',   'Other'],]'
       );

    FOR x
        IN (
            SELECT    CASE WHEN ROWNUM = 1 THEN ' ' ELSE ',' END
                   || '['
                   || TO_CHAR(sample_time, '"new Date("yyyy,mm,dd,hh24,mi,ss")"')
                   || ','
                   || NVL("ON CPU", 0)
                   || ','
                   || NVL("Scheduler", 0)
                   || ','
                   || NVL("User I/O", 0)
                   || ','
                   || NVL("System I/O", 0)
                   || ','
                   || NVL("Concurrency", 0)
                   || ','
                   || NVL("Application", 0)
                   || ','
                   || NVL("Commit", 0)
                   || ','
                   || NVL("Configuration", 0)
                   || ','
                   || NVL("Administrative", 0)
                   || ','
                   || NVL("Network", 0)
                   || ','
                   || NVL("Queueing", 0)
                   || ','
                   || NVL("Cluster", 0)
                   || ','
                   || NVL("Other", 0)
                   || ']' arrayrow
              FROM (SELECT sample_time,
                           CASE
                               WHEN session_state = 'ON CPU'
                               THEN
                                   'ON CPU'
                               WHEN wait_class IN ('Concurrency',
                                                   'System I/O',
                                                   'User I/O',
                                                   'Configuration',
                                                   'Application',
                                                   'Commit',
                                                   'Queueing',
                                                   'Network',
                                                   'Administrative',
                                                   'Cluster',
                                                   'Scheduler')
                               THEN
                                   wait_class
                               ELSE
                                   'Other'
                           END activity
                      FROM dba_hist_active_sess_history
                     WHERE sample_time >= p_start_time 
                       AND sample_time < p_end_time)
                   PIVOT (COUNT(*)
                         FOR activity
                         IN ('Other' AS "Other",
                            'Cluster' AS "Cluster",
                            'Queueing' AS "Queueing",
                            'Network' AS "Network",
                            'Administrative' AS "Administrative",
                            'Configuration' AS "Configuration",
                            'Commit' AS "Commit",
                            'Application' AS "Application",
                            'Concurrency' AS "Concurrency",
                            'System I/O' AS "System I/O",
                            'User I/O' AS "User I/O",
                            'Scheduler' AS "Scheduler",
                            'ON CPU' AS "ON CPU"))
    )
    LOOP
        DBMS_OUTPUT.put_line(x.arrayrow);
    END LOOP;

    DBMS_OUTPUT.put_line(
           q'[        ]);           

      var options = {
        width: 1200,
        height: 700,
        legend: { position: 'right', maxLines: 3 },
        bar: { groupWidth: '75%' },
        isStacked: true,
        colors: [
          '#04CE04',
          '#CCFECC',
          '#084AD4',
          '#0499F4',
          '#8E1B04',
          '#C72D04',
          '#E46A04',
          '#5B460D',
          '#707251',
          '#9C9376',
          '#C5B79A',
          '#CCC3AE',
          '#F56CA9']
      };
        // Instantiate and draw our chart, passing in some options.
        var chart = new google.visualization.ColumnChart(document.getElementById('chart_div'));
        chart.draw(data, options);
      }
    </script>
  </head>

  <body><div id="chart_div"></div></body></html>]'
    );
END;
/
spool off
host "C:\Program Files\Mozilla Firefox\firefox.exe" .\ash.html
exit

Using the script above the googlechart version of the ASH data looks like below.

The last visualization tool I’ll explore is perhaps the most obvious one – Application Express (APEX.) To create a stacked bar chart similar to the ones above…

Create a new Region
  • Under Identification
    • Set Title – Active Sessions
    • Set Type – Chart
  • Under Attributes
    • Under Chart, Set Type – Bar
    • Under Appearance, Set Stacked – Yes
    • Under Settings, Set Time Axis Type – Enabled
  • Under Series
    • Under Source, Set Type – SQL Query (query is below)
    • Under Column Mapping
      • Set Series Name – ACTIVITY
      • Set Label – SAMPLE_TIME
      • Set Value – CNT
    • Under Appearance, Set Color – &COLOR.

In addition to the settings above, we’ll also need to add the SQL query in the Series Source. Below is the same query used in the R example, except I’ve also added a color attribute.

  SELECT sample_time,
         activity,
         color,
         COUNT(*) cnt
    FROM (SELECT sample_time,
                 CASE
                     WHEN session_state = 'ON CPU'
                     THEN
                         'ON CPU'
                     WHEN wait_class IN ('Concurrency',
                                         'System I/O',
                                         'User I/O',
                                         'Configuration',
                                         'Application',
                                         'Commit',
                                         'Queueing',
                                         'Network',
                                         'Administrative',
                                         'Cluster',
                                         'Scheduler')
                     THEN
                         wait_class
                     ELSE
                         'Other'
                 END activity,
                 CASE
                     WHEN session_state = 'ON CPU' THEN '#04CE04'
                     WHEN wait_class = 'Concurrency' THEN '#8E1B04'
                     WHEN wait_class = 'System I/O' THEN '#0499F4'
                     WHEN wait_class = 'User I/O' THEN '#084AD4'
                     WHEN wait_class = 'Configuration' THEN '#5B460D'
                     WHEN wait_class = 'Application' THEN '#C72D04'
                     WHEN wait_class = 'Commit' THEN '#E46A04'
                     WHEN wait_class = 'Queueing' THEN '#C5B79A'
                     WHEN wait_class = 'Network' THEN '#9C9376'
                     WHEN wait_class = 'Administrative' THEN '#707251'
                     WHEN wait_class = 'Cluster' THEN '#CCC3AE'
                     WHEN wait_class = 'Scheduler' THEN '#CCFECC'
                     ELSE '#F56CA9'
                 END color
            FROM dba_hist_active_sess_history
           WHERE sample_time >= TIMESTAMP '2018-09-15 13:00:00'
             AND sample_time < TIMESTAMP '2018-09-15 13:30:00')
GROUP BY sample_time, activity, color

Running the page with the chart region defined as above produced the following chart. Unfortunately, the options for defining the stacking order are limited to sorting by value or label. Thus, it’s not possible (using just the APEX native options) to guarantee the APEX chart will stack the same way as the R or Googlechart variations. It might be possible with some custom java script to manipulate the underlying JET attributes; but I have not explored that route. The overall effect though, even with the current limitations is still quite similar to the other tools.

Hopefully these examples will help you in digging into your own ASH data and maybe give a little head start into exploring some new tools. I welcome any suggestions, corrections, or additions to any of the methods above.

While all of these examples have hard-coded date ranges, I hope the code arrangement provides a solid framework to parameterize within your own environment and tools. As always, questions and comments are welcome.


How to pass a vector or data.frame from R to Oracle as a bind variable in a select statement using ROracle.

Recently I was asked how to bind an R variable to a SQL statement executed with ROracle. In this case it wasn’t a simple “select * from table where x = :b1 and y = :b2.”

You can do those quite easily with a data.frame with only one row in it using

dbGetQuery(connection, sqlstatement, data.frame)

It will also work for some bulk binding, such as a multi-row insert from a data.frame or a vector.

Here though, the developer wanted to pass in an R set as an In-List for a where clause. That is something like “select * from table where (x,y) in (:b1)” where the bind variable was a data.frame consisting of a few hundred observations of 2 variables each. Of course, the syntax shown is only pseudo-code, which further complicated the requirements. How to create the query with valid syntax and how to pass the data.frame to the query as a bind variable?

In the context of a select statement neither a data.frame nor a vector is legal as an input parameter if they have more than one row. Unfortunately that means we’ll have to take the extra step of converting the data into a legal data type to pass it through the dbGetQuery interface and then do something with that converted data within the SQL statement.

A simple and, more importantly, legal option is to convert the data into a delimited chr variable. On the Oracle side this will be interpreted as a VARCHAR2 or a CLOB depending on the size, either of which is easily parsed back into the original rows and columns as needed.

First, let’s use a simple vector with just a few elements. We’ll look up a few employees from the sample SCOTT.EMP table. We can’t use “in (:b1)” as shown above, but if we can generate a collection we can use “member of :b1.”

Since we’re going to pass a delimited chr value from R, we need to convert that text into a collection. Previously I posted how to split a clob into a nested table of varchar2 values. We’ll use that here too. For this first example the clob handling will be a bit of overkill since our text string will be small; but it will still illustrate the process.

So, first, we’ll construct a vector, then make a chr variable from the vector. Finally, pass the chr into a query using the split_clob function to create a nested table collection for use with the member of condition.

> employees_vector <- c("SCOTT","KING","ADAMS")
> employees_vector
[1] "SCOTT" "KING" "ADAMS"

> employees_chr <- paste(employees_vector,collapse=",")
> employees_chr
[1] "SCOTT,KING,ADAMS"

> employees_df <- dbGetQuery(con,"select * from emp where ename member of split_clob(:b1,',')",employees_chr)
> employees_df
  EMPNO ENAME       JOB   MGR            HIREDATE  SAL COMM DEPTNO
1  7788 SCOTT   ANALYST  7566 1987-04-19 00:00:00 3000   NA     20
2  7839 KING  PRESIDENT    NA 1981-11-16 23:00:00 5000   NA     10
3  7876 ADAMS     CLERK  7788 1987-05-23 00:00:00 1100   NA     20

It is possible to use an IN clause in the query, but you must create a subquery for the condition. So, using the same chr variable we still use split_clob, but we then use the TABLE function to use the resulting collection as a data source for the subquery.

> employees_df2 <- dbGetQuery(con,"select * from emp where ename in (select * from table(split_clob(:b1,',')))",employees_chr)
> employees_df2
  EMPNO ENAME       JOB  MGR            HIREDATE  SAL COMM DEPTNO
1  7788 SCOTT   ANALYST 7566 1987-04-19 00:00:00 3000   NA     20
2  7839 KING  PRESIDENT   NA 1981-11-16 23:00:00 5000   NA     10
3  7876 ADAMS     CLERK 7788 1987-05-23 00:00:00 1100   NA     20

It is also possible to use the text field directly with a simple INSTR (where instr(‘KING,SCOTT,ADAMS’,ename) > 0,) but doing so reliably is more difficult. Also, by leaving the table column untouched then indexes on the table can be used more reliably. Due to these limitations I’m not providing examples. Again, it is possible to do so, but not recommended.

The examples above are for a sets where each row only contains a single value (a vector, or a data.frame of a single column.) But what if you need multi-column checks? I.e. Something of the form “select * from table1 where (a,b) in (select x,y from table2).” Where “table2” is somehow based on our data.frame contents.

While the basic idea is the same, the use of multiple columns in the condition creates an additional challenge because we need to somehow encode the rows and columns into a chr field such that the fields are distinct but still grouped by row. Also the split_clob function will only generate one value for each row instead of reconstructing all of the individual fields.

First, on the R side, we’ll use paste function again, but twice, once with the separation delimiter and then again with the collapse delimiter.

> input_df <- data.frame(jobs=c("CLERK","CLERK","ANALYST","PRESIDENT"),depts=c("SALES","ACCOUNTING","RESEARCH","ACCOUNTING"))
> input_df
       jobs      depts
1     CLERK      SALES
2     CLERK ACCOUNTING
3   ANALYST   RESEARCH
4 PRESIDENT ACCOUNTING

> input_chr <- paste(paste(input_df$jobs,input_df$depts,sep="|"),collapse=",")
> input_chr
[1] "CLERK|SALES,CLERK|ACCOUNTING,ANALYST|RESEARCH,PRESIDENT|ACCOUNTING"

Now we have a single chr field, where each row is delimited with commas and the fields within the row are delimited with pipes. Using split_clob we can separate the string into 4 fields and then parse each of those into a row of 2 fields.

Removing R for a moment, we can test the splitting and parsing to see what the subquery will return.

SELECT SUBSTR(COLUMN_VALUE, 1, INSTR(COLUMN_VALUE, '|') - 1) jobs,
 SUBSTR(COLUMN_VALUE, INSTR(COLUMN_VALUE, '|') + 1) depts
 FROM TABLE(split_clob('CLERK|SALES,CLERK|ACCOUNTING,ANALYST|RESEARCH,PRESIDENT|ACCOUNTING', ','));

JOBS         DEPTS 
------------ --------------
CLERK        SALES 
CLERK        ACCOUNTING 
ANALYST      RESEARCH 
PRESIDENT    ACCOUNTING

4 rows selected.

Now we put it all together and pull some information from the emp and dept tables about our job/department pairs.

> emp_dept_df <- dbGetQuery(con,"SELECT emp.ename, emp.job, emp.sal, dept.dname, dept.loc
+                                  FROM emp INNER JOIN dept ON emp.deptno = dept.deptno
+                                 WHERE (job, dname) IN
+                                      (SELECT SUBSTR(COLUMN_VALUE, 1, INSTR(COLUMN_VALUE, '|') - 1) jobs,
+                                              SUBSTR(COLUMN_VALUE, INSTR(COLUMN_VALUE, '|') + 1) depts
+                                         FROM TABLE(split_clob( :b1, ',')))
+                                ORDER BY dname, ename"
+                          ,input_chr)
> emp_dept_df
   ENAME       JOB  SAL      DNAME      LOC
1   KING PRESIDENT 5000 ACCOUNTING NEW YORK
2 MILLER     CLERK 1300 ACCOUNTING NEW YORK
3   FORD   ANALYST 3000   RESEARCH   DALLAS
4  SCOTT   ANALYST 3000   RESEARCH   DALLAS
5  JAMES     CLERK  950      SALES  CHICAGO

Obviously as the number of fields in a data.frame expands the parsing will get longer and more complicated. If you’ll be performing similar queries often, you may want to build a dedicated function within the database that combines the split_clob functionality with the parsing of the individual fields and returns a collection of user-defined types. Also, these examples used text fields and columns but could include dates or numeric values. In that case you would need to ensure consistent formatting in the string construction so the parsing can be accomplished correctly and reliably. The overall technique remains the same though.

As mentioned above, these steps do entail extra processing on both the R side as well as the Oracle side of the interface; but for most uses cases the extra resource consumption will hopefully be minor and this method provides a work around to a limitation of syntax in the ROracle/DBI functionality suite.

I hope you find it useful.