Yesterday, Connor McDonald posted an interesting article about a math puzzle he received asking him to calculate the integral of a function over a range. He posted two variations using the Trapezoidal Rule. In his post he mentions the Riemann sums method and I thought it would be interesting to explore that option as well.
I’ve refactored his queries to both have the same shape and made them functionally equivalent (i.e. they produce the same results given the same inputs.) In both of my versions I calculate the f values first and in the outer query I sum the results. The difference between the two, as in Connor’s examples, is the use of CONNECT BY vs a Recursive-WITH clause.
WITH FUNCTION f(x NUMBER) RETURN NUMBER IS BEGIN RETURN 3 * x * x + 2 * x; END; boundaries AS (SELECT 1 x_start, 5 x_end, 0.00001 delta FROM DUAL), subresults AS ( SELECT LEVEL seq, f(x_start + (LEVEL - 1) * delta) fnc, x_start + (LEVEL - 1) * delta inp FROM boundaries CONNECT BY LEVEL <= (x_end - x_start) / delta + 1) SELECT SUM(fnc) * delta reimann_sum, SUM(CASE WHEN seq IN (1, (x_end - x_start) / delta + 1) THEN 1 ELSE 2 END * fnc) * delta / 2 trapezoidal_rule FROM subresults CROSS JOIN boundaries;
Or the same thing but using Recursive-WITH…
WITH FUNCTION f(x NUMBER) RETURN NUMBER IS BEGIN RETURN 3 * x * x + 2 * x; END; boundaries AS(SELECT 1 x_start, 5 x_end, 0.00001 delta FROM DUAL), subresults (seq, fnc, inp) AS (SELECT 1 seq, f(x_start) fnc, x_start inp FROM boundaries UNION ALL SELECT seq + 1, f(inp + delta), inp + delta FROM subresults, boundaries WHERE inp + delta <= x_end) SELECT SUM(fnc) * delta reimann_sum, SUM(CASE WHEN seq IN (1, (x_end - x_start) / delta + 1) THEN 1 ELSE 2 END * fnc) * delta / 2 trapezoidal_rule FROM subresults CROSS JOIN boundaries;
As Connor noted, the trapezoidal rule did produce slightly more accurate results.
A step further…
I liked this topic and wanted to see if I could do more with it. Could I do the inverse? That is, could I calculate the derivative of a function?
From introductory calculus we know the function f(x) is differentiable at x if the limit exists. That limit will be the slope of the tangent to curve of f(x), that is, the derivative at x.
So, perhaps the most obvious approach to try to simulate the limit directly. We’ll calculate the slope at x for h in a series where h gets progressively smaller and we’ll loop until the precision gets so small that the resulting slope calculation starts to repeat because Oracle’s NUMBER data type can’t represent distinct values anymore. This will be as close to 0 as our limit can get.
WITH FUNCTION f(x NUMBER) RETURN NUMBER IS BEGIN RETURN x ** 3 + x ** 2; END; params AS(SELECT 5 x, 1 h FROM DUAL), limit_series (seq, x, slope, h) AS (SELECT 1 seq, x, (f(x+h) - f(x)) / h, h/10 FROM params UNION ALL SELECT seq + 1, x, (f(x+h) - f(x)) / h, h/10 FROM limit_series WHERE slope != (f(x + h) - f(x)) / h) SELECT * FROM limit_series ORDER BY seq; SEQ X SLOPE H --- - --------------------------- ---------------------- 1 5 102 0.1 2 5 86.61 0.01 3 5 85.1601 0.001 4 5 85.016001 0.0001 5 5 85.00160001 0.00001 6 5 85.0001600001 0.000001 7 5 85.000016000001 0.0000001 8 5 85.00000160000001 0.00000001 9 5 85.0000001600000001 0.000000001 10 5 85.000000016000000001 0.0000000001 11 5 85.00000000160000000001 0.00000000001 12 5 85.0000000001600000000001 0.000000000001 13 5 85.000000000016000000000001 0.0000000000001 14 5 85.0000000000016 0.00000000000001 15 5 85.00000000000016 0.000000000000001 16 5 85.000000000000016 0.0000000000000001 17 5 85.0000000000000016 0.00000000000000001 18 5 85.00000000000000016 0.000000000000000001 19 5 85.000000000000000016 0.0000000000000000001 20 5 85 0.00000000000000000001
That turned out to be pretty easy and we can jump directly to the final answer with a small tweak to the final query sorting the values in reverse and taking the first row (seq=20)
SELECT slope FROM limit_series ORDER BY seq DESC FETCH FIRST 1 ROWS ONLY; SLOPE ----- 85
And we can verify the results because the derivative of is and therefore the derivative value at x=5 is or or .
Let’s try another function now cos(x) whose derivative is -sin(x).
WITH FUNCTION f(x NUMBER) RETURN NUMBER IS BEGIN RETURN cos(x); END; params AS(SELECT 5 x, 1 h FROM DUAL), limit_series (seq, x, slope, h) AS (SELECT 1 seq, x, (f(x+h) - f(x)) / h, h/10 FROM params UNION ALL SELECT seq + 1, x, (f(x+h) - f(x)) / h, h/10 FROM limit_series WHERE slope != (f(x + h) - f(x)) / h) SELECT slope FROM limit_series ORDER BY seq DESC FETCH FIRST 1 ROWS ONLY; SLOPE ----- 0
Oops, that’s not right! It should be something close to 0.9589. If we remove the ORDER BY and FETCH FIRST clauses we can see the series trend.
0.6765081011871397560790131264093670971281 0.943155572497542988539363814154243625703 0.95748999356370644806813409211376650739 0.9587822837615216602736007115817115509 0.958910089955670003198194216079624562 0.95892285633622909333678406976316905 0.9589241328318859165793975821436058 0.958924260480027597524728563518437 0.95892427324482752559495185125713 0.9589242745213073760017205615097 0.958924274648955359618394885793 0.95892427466172015796582229163 0.9589242746629966378004226305 0.95892427466312428578388118 0.95892427466313705058222794 0.9589242746631383270620536 0.958924274663138454709886 0.95892427466313846747505 0.9589242746631384687557 0.958924274663138469013 0.95892427466313846892 0.9589242746631384839 0.958924274663138464 0.95892427466313856 0.9589242746631501 0.9589242746631 0.9589242746636 0.9589242746726 0.958924274783 0.9589242752 0.9589242798 0.958924411 0.95892467 0.9589222 0.959018 0.95953 0.9642 0.929 1.54 0
We can see the same problem using the secant method, that is
WITH FUNCTION f(x NUMBER) RETURN NUMBER IS BEGIN RETURN cos(x); END; params AS(SELECT 5 x, 1 h FROM DUAL), limit_series (seq, x, slope, h) AS (SELECT 1 seq, x, (f(x+h) - f(x-h)) / (2*h), h/10 FROM params UNION ALL SELECT seq + 1, x, (f(x+h) - f(x-h)) / (2*h), h/10 FROM limit_series WHERE slope != (f(x+h) - f(x-h)) / (2*h)) SELECT slope FROM limit_series; 0.80690695375698896759241024051033739345 0.957326866452025794131232044035308361106 0.95890829267180424921144573688737137974 0.95892411484243401607217485403390024155 0.9589242730649313452536938533630892785 0.95892427464715639764884867536399627 0.95892427466297864818071055773554835 0.958924274663136870686029967592652 0.958924274663138452911083161770375 0.95892427466313846873333369371355 0.9589242746631384688915561989995 0.958924274663138468893138423115 0.95892427466313846889315424525 0.958924274663138468893154465 0.958924274663138468893154635 0.95892427466313846889315095 0.9589242746631384688930085 0.958924274663138468893035 0.9589242746631384688973 0.9589242746631384689925 0.95892427466313846866 0.95892427466313848135 0.9589242746631384385 0.958924274663138065 0.9589242746631426 0.9589242746630745 0.95892427466335 0.9589242746676 0.9589242747075 0.95892427444 0.9589242722 0.958924335 0.958923905 0.95891715 0.958992 0.959265 0.96155 0.9535 1.28 0
In both the tangent and secant estimates we can see they fairly quickly approached the correct answer but then began dwindling away until the rounding errors and precision limits caused them to fluctuate and then become 0. This is a known problem with an arithmetic approach in computation due to the limits of the numerical data types.
As h approaches 0 rounding errors evaluating the functions begin to creep in. Furthermore we know that eventually x+h will no longer be distinguishable from x. For the Oracle NUMBER type, that limit is 10-38. While the NUMBER type can represent values much smaller than that, it can’t reliably apply them to other values in arithmetic or function calls while maintaining mathematical integrity. Formally, this limit is known a machine epsilon. The name originating from the hardware limitations. The NUMBER type isn’t represented in the hosting hardware itself but the same principle still applies. There will always be some limit. Even if the NUMBER type, operators, and functions were able to hold and process a billion significant digits there would still be a limit. It would be a much smaller value but it would still exist.
Fortunately, if we know the epsilon value we can use it to calculate a viable minimum h. This will not only ameliorate the precision problems, it also means we don’t need to loop for repeating values. Once we know the target h-value, we can just calculate the slope at that point.
The target h is value is . This doesn’t work for x=0 and, if you have very large or very small values or many significant digits for x the epsilon root may be less useful as you will introduce a new rounding error in the multiplication; but this should work for most data ranges.
WITH FUNCTION f(x NUMBER) RETURN NUMBER IS BEGIN RETURN COS(x); END; params AS(SELECT 5 x, 5 * SQRT(POWER(10, -38)) h FROM DUAL) SELECT (f(x + h) - f(x)) / h FROM params; SLOPE ----------------------- 0.958924274663138468809
Comparing that to the known derivative -sin(x), we can see the estimate is now quite close; only deviating after 19 digits.
SELECT -SIN(5) FROM DUAL; SLOPE ------------------------------------------ 0.9589242746631384688931544061559939733579
Going back to our initial polynomial function we can see this method is still valid.
WITH FUNCTION f(x NUMBER) RETURN NUMBER IS BEGIN RETURN x ** 3 + x ** 2; END; params AS(SELECT 5 x, 5 * SQRT(POWER(10, -38)) h FROM DUAL) SELECT (f(x + h) - f(x)) / h slope FROM params; SLOPE --------------------- 85.000000000000000008
Earlier I mentioned using a secant as an alternate means of calculating the slope. You can use the same method here with the secant formula.
WITH FUNCTION f(x NUMBER) RETURN NUMBER IS BEGIN RETURN x ** 3 + x ** 2; END; params AS(SELECT 5 x, 5 * SQRT(POWER(10, -38)) h FROM DUAL) SELECT (f(x + h) - f(x-h)) / (2*h) slope FROM params; SLOPE ----- 85
WITH FUNCTION f(x NUMBER) RETURN NUMBER IS BEGIN RETURN cos(x); END; params AS(SELECT 5 x, 5 * SQRT(POWER(10, -38)) h FROM DUAL) SELECT (f(x + h) - f(x-h)) / (2*h) slope FROM params; SLOPE ------------------------ 0.9589242746631384688821
And we get a little more precision using this method.
There are many other methods of calculating derivatives; hopefully these give you a little taste. I thought this was a fun exercise and I thank Connor for posting the original inspiration that got it started.