This past week I was looking through the Oracle documentation on a task of pedantry; looking up the exact description of Oracle’s outer join syntax. While digging through the syntax diagrams and explanations of the SQL Reference I came across the phrase “partitioned outer join”. It’s an old feature now, having been around since 2003 with the introduction of 10gR1. However, I don’t recall ever having used it before. I probably read about it back then, but having not practiced it, it had fallen out of memory.
I asked around and found many others had never used it either. That sounded like a good topic for a blog article.
Much like an analytic function windowing clause, the partitioning applies the same value to each row in the partition and then applies the outer join condition to those rows.
Compare this to a non-partitioned outer join (a normal outer join) where each unmatched row of the driving table will have one row in the result set with nulls for the missing outer joined table.
With a partitioned outer join, the null-rows for each unmatched row of the driving table will be repeated for each partition expression.
To illustrate, I’ll use the standard Order Entry (OE) sample schema.
If I outer join all orders by month for the year 2007 I might use a query like this:
WITH months AS ( SELECT ADD_MONTHS(DATE '2007-01-01', LEVEL - 1) month FROM DUAL CONNECT BY LEVEL <= 12), orders_by_month AS (SELECT customer_id, TRUNC(order_date, 'mm') order_month, order_id FROM oe.orders WHERE order_date >= DATE '2007-01-01' AND order_date < DATE '2008-01-01') SELECT customer_id, m.month, order_id FROM months m LEFT JOIN orders_by_month o ON o.order_month = m.month ORDER BY customer_id, month
Which will produce 70 rows, 69 rows of results for customers that have orders for some months and one null-row for January where no customer placed an order.
CUSTOMER_ID MONTH ORDER_ID ----------- ---------- ---------- 101 2007-08-01 2458 101 2007-10-01 2430 102 2007-03-01 2414 102 2007-09-01 2432 102 2007-11-01 2397 103 2007-09-01 2433 103 2007-10-01 2454 104 2007-03-01 2416 104 2007-09-01 2438 105 2007-03-01 2417 105 2007-08-01 2439 107 2007-03-01 2419 107 2007-08-01 2440 107 2007-11-01 2360 108 2007-03-01 2420 108 2007-11-01 2361 109 2007-03-01 2421 109 2007-07-01 2444 109 2007-11-01 2362 116 2007-06-01 2369 116 2007-09-01 2436 116 2007-10-01 2453 116 2007-11-01 2428 117 2007-07-01 2446 117 2007-11-01 2429 118 2007-05-01 2371 118 2007-10-01 2457 119 2007-02-01 2372 122 2007-02-01 2375 123 2007-06-01 2376 141 2007-06-01 2377 142 2007-05-01 2378 143 2007-05-01 2380 144 2007-09-01 2435 144 2007-10-01 2363 144 2007-12-01 2422 145 2007-06-01 2448 145 2007-08-01 2364 145 2007-09-01 2455 145 2007-11-01 2423 146 2007-05-01 2379 146 2007-06-01 2449 146 2007-08-01 2365 146 2007-11-01 2424 147 2007-04-01 2450 147 2007-08-01 2366 147 2007-12-01 2385 148 2007-06-01 2406 148 2007-12-01 2451 148 2007-12-01 2386 149 2007-03-01 2387 149 2007-09-01 2434 149 2007-10-01 2452 149 2007-11-01 2427 150 2007-06-01 2388 152 2007-11-01 2390 154 2007-07-01 2392 157 2007-11-01 2398 158 2007-11-01 2399 159 2007-07-01 2400 160 2007-07-01 2401 161 2007-07-01 2402 162 2007-07-01 2403 163 2007-07-01 2404 164 2007-07-01 2405 165 2007-06-01 2407 166 2007-06-01 2408 167 2007-06-01 2409 169 2007-05-01 2411 2007-01-01
Now let’s add the PARTITION BY clause, so we can see the calendar for each customer. That is, we don’t just want to see January for the entire result set as nulls, we want to see which months for each customer were lacking an order.
WITH months AS ( SELECT ADD_MONTHS(DATE '2007-01-01', LEVEL - 1) month FROM DUAL CONNECT BY LEVEL <= 12), orders_by_month AS (SELECT customer_id, TRUNC(order_date, 'mm') order_month, order_id FROM oe.orders WHERE order_date >= DATE '2007-01-01' AND order_date < DATE '2008-01-01') SELECT customer_id, m.month, order_id FROM months m LEFT JOIN orders_by_month o PARTITION BY (customer_id) ON o.order_month = m.month ORDER BY customer_id, month
And now we get the 12 months repeated for each customer and if there are no orders for a month, we get a null row for that customer for that month. The query produces 457 rows, I have abbreviated the output below to just the first 4 customers.
CUSTOMER_ID MONTH ORDER_ID ----------- ---------- ---------- 101 2007-01-01 101 2007-02-01 101 2007-03-01 101 2007-04-01 101 2007-05-01 101 2007-06-01 101 2007-07-01 101 2007-08-01 2458 101 2007-09-01 101 2007-10-01 2430 101 2007-11-01 101 2007-12-01 102 2007-01-01 102 2007-02-01 102 2007-03-01 2414 102 2007-04-01 102 2007-05-01 102 2007-06-01 102 2007-07-01 102 2007-08-01 102 2007-09-01 2432 102 2007-10-01 102 2007-11-01 2397 102 2007-12-01 103 2007-01-01 103 2007-02-01 103 2007-03-01 103 2007-04-01 103 2007-05-01 103 2007-06-01 103 2007-07-01 103 2007-08-01 103 2007-09-01 2433 103 2007-10-01 2454 103 2007-11-01 103 2007-12-01 104 2007-01-01 104 2007-02-01 104 2007-03-01 2416 104 2007-04-01 104 2007-05-01 104 2007-06-01 104 2007-07-01 104 2007-08-01 104 2007-09-01 2438 104 2007-10-01 104 2007-11-01 104 2007-12-01 ...
While partitioned outer joins aren’t something I expect to use often; I will try to keep my eye out for opportunities to use them. I’m sure sometime in the past 15 years I’ve probably written a query using multiple self-joins and/or joins to additional tables to produce the same functionality… another tool for my toolbox.
Questions and comments, as always, are welcome.