This past week I was looking through the Oracle documentation on a task of pedantry; looking up the exact description of Oracle’s outer join syntax. While digging through the syntax diagrams and explanations of the SQL Reference I came across the phrase “partitioned outer join”. It’s an old feature now, having been around since 2003 with the introduction of 10gR1. However, I don’t recall ever having used it before. I probably read about it back then, but having not practiced it, it had fallen out of memory.
I asked around and found many others had never used it either. That sounded like a good topic for a blog article.
Much like an analytic function windowing clause, the partitioning applies the same value to each row in the partition and then applies the outer join condition to those rows.
Compare this to a non-partitioned outer join (a normal outer join) where each unmatched row of the driving table will have one row in the result set with nulls for the missing outer joined table.
With a partitioned outer join, the null-rows for each unmatched row of the driving table will be repeated for each partition expression.
To illustrate, I’ll use the standard Order Entry (OE) sample schema.
If I outer join all orders by month for the year 2007 I might use a query like this:
WITH
months
AS
( SELECT ADD_MONTHS(DATE '2007-01-01', LEVEL - 1) month
FROM DUAL
CONNECT BY LEVEL <= 12),
orders_by_month
AS
(SELECT customer_id, TRUNC(order_date, 'mm') order_month, order_id
FROM oe.orders
WHERE order_date >= DATE '2007-01-01' AND order_date < DATE '2008-01-01')
SELECT customer_id, m.month, order_id
FROM months m LEFT JOIN orders_by_month o ON o.order_month = m.month
ORDER BY customer_id, month
Which will produce 70 rows, 69 rows of results for customers that have orders for some months and one null-row for January where no customer placed an order.
CUSTOMER_ID MONTH ORDER_ID
----------- ---------- ----------
101 2007-08-01 2458
101 2007-10-01 2430
102 2007-03-01 2414
102 2007-09-01 2432
102 2007-11-01 2397
103 2007-09-01 2433
103 2007-10-01 2454
104 2007-03-01 2416
104 2007-09-01 2438
105 2007-03-01 2417
105 2007-08-01 2439
107 2007-03-01 2419
107 2007-08-01 2440
107 2007-11-01 2360
108 2007-03-01 2420
108 2007-11-01 2361
109 2007-03-01 2421
109 2007-07-01 2444
109 2007-11-01 2362
116 2007-06-01 2369
116 2007-09-01 2436
116 2007-10-01 2453
116 2007-11-01 2428
117 2007-07-01 2446
117 2007-11-01 2429
118 2007-05-01 2371
118 2007-10-01 2457
119 2007-02-01 2372
122 2007-02-01 2375
123 2007-06-01 2376
141 2007-06-01 2377
142 2007-05-01 2378
143 2007-05-01 2380
144 2007-09-01 2435
144 2007-10-01 2363
144 2007-12-01 2422
145 2007-06-01 2448
145 2007-08-01 2364
145 2007-09-01 2455
145 2007-11-01 2423
146 2007-05-01 2379
146 2007-06-01 2449
146 2007-08-01 2365
146 2007-11-01 2424
147 2007-04-01 2450
147 2007-08-01 2366
147 2007-12-01 2385
148 2007-06-01 2406
148 2007-12-01 2451
148 2007-12-01 2386
149 2007-03-01 2387
149 2007-09-01 2434
149 2007-10-01 2452
149 2007-11-01 2427
150 2007-06-01 2388
152 2007-11-01 2390
154 2007-07-01 2392
157 2007-11-01 2398
158 2007-11-01 2399
159 2007-07-01 2400
160 2007-07-01 2401
161 2007-07-01 2402
162 2007-07-01 2403
163 2007-07-01 2404
164 2007-07-01 2405
165 2007-06-01 2407
166 2007-06-01 2408
167 2007-06-01 2409
169 2007-05-01 2411
2007-01-01
Now let’s add the PARTITION BY clause, so we can see the calendar for each customer. That is, we don’t just want to see January for the entire result set as nulls, we want to see which months for each customer were lacking an order.
WITH
months
AS
( SELECT ADD_MONTHS(DATE '2007-01-01', LEVEL - 1) month
FROM DUAL
CONNECT BY LEVEL <= 12),
orders_by_month
AS
(SELECT customer_id, TRUNC(order_date, 'mm') order_month, order_id
FROM oe.orders
WHERE order_date >= DATE '2007-01-01' AND order_date < DATE '2008-01-01')
SELECT customer_id, m.month, order_id
FROM months m
LEFT JOIN orders_by_month o PARTITION BY (customer_id)
ON o.order_month = m.month
ORDER BY customer_id, month
And now we get the 12 months repeated for each customer and if there are no orders for a month, we get a null row for that customer for that month. The query produces 457 rows, I have abbreviated the output below to just the first 4 customers.
CUSTOMER_ID MONTH ORDER_ID
----------- ---------- ----------
101 2007-01-01
101 2007-02-01
101 2007-03-01
101 2007-04-01
101 2007-05-01
101 2007-06-01
101 2007-07-01
101 2007-08-01 2458
101 2007-09-01
101 2007-10-01 2430
101 2007-11-01
101 2007-12-01
102 2007-01-01
102 2007-02-01
102 2007-03-01 2414
102 2007-04-01
102 2007-05-01
102 2007-06-01
102 2007-07-01
102 2007-08-01
102 2007-09-01 2432
102 2007-10-01
102 2007-11-01 2397
102 2007-12-01
103 2007-01-01
103 2007-02-01
103 2007-03-01
103 2007-04-01
103 2007-05-01
103 2007-06-01
103 2007-07-01
103 2007-08-01
103 2007-09-01 2433
103 2007-10-01 2454
103 2007-11-01
103 2007-12-01
104 2007-01-01
104 2007-02-01
104 2007-03-01 2416
104 2007-04-01
104 2007-05-01
104 2007-06-01
104 2007-07-01
104 2007-08-01
104 2007-09-01 2438
104 2007-10-01
104 2007-11-01
104 2007-12-01
...
While partitioned outer joins aren’t something I expect to use often; I will try to keep my eye out for opportunities to use them. I’m sure sometime in the past 15 years I’ve probably written a query using multiple self-joins and/or joins to additional tables to produce the same functionality… another tool for my toolbox.
Questions and comments, as always, are welcome.