Yesterday, Connor McDonald posted an interesting article about a math puzzle he received asking him to calculate the integral of a function over a range. He posted two variations using the Trapezoidal Rule. In his post he mentions the Riemann sums method and I thought it would be interesting to explore that option as well.
I’ve refactored his queries to both have the same shape and made them functionally equivalent (i.e. they produce the same results given the same inputs.) In both of my versions I calculate the f values first and in the outer query I sum the results. The difference between the two, as in Connor’s examples, is the use of CONNECT BY vs a Recursive-WITH clause.
WITH
FUNCTION f(x NUMBER)
RETURN NUMBER
IS
BEGIN
RETURN 3 * x * x + 2 * x;
END;
boundaries AS (SELECT 1 x_start, 5 x_end, 0.00001 delta FROM DUAL),
subresults
AS
( SELECT LEVEL seq,
f(x_start + (LEVEL - 1) * delta) fnc,
x_start + (LEVEL - 1) * delta inp
FROM boundaries
CONNECT BY LEVEL <= (x_end - x_start) / delta + 1)
SELECT
SUM(fnc) * delta reimann_sum,
SUM(CASE WHEN seq IN (1, (x_end - x_start) / delta + 1)
THEN 1
ELSE 2
END * fnc) * delta / 2 trapezoidal_rule
FROM subresults CROSS JOIN boundaries;
Or the same thing but using Recursive-WITH…
WITH
FUNCTION f(x NUMBER)
RETURN NUMBER
IS
BEGIN
RETURN 3 * x * x + 2 * x;
END;
boundaries AS(SELECT 1 x_start, 5 x_end, 0.00001 delta FROM DUAL),
subresults (seq, fnc, inp)
AS
(SELECT 1 seq, f(x_start) fnc, x_start inp FROM boundaries
UNION ALL
SELECT seq + 1, f(inp + delta), inp + delta
FROM subresults, boundaries
WHERE inp + delta <= x_end)
SELECT
SUM(fnc) * delta reimann_sum,
SUM(CASE WHEN seq IN (1, (x_end - x_start) / delta + 1)
THEN 1
ELSE 2
END * fnc) * delta / 2 trapezoidal_rule
FROM subresults CROSS JOIN boundaries;
As Connor noted, the trapezoidal rule did produce slightly more accurate results.
A step further…
I liked this topic and wanted to see if I could do more with it. Could I do the inverse? That is, could I calculate the derivative of a function?
From introductory calculus we know the function f(x) is differentiable at x if the limit exists. That limit will be the slope of the tangent to curve of f(x), that is, the derivative at x.
So, perhaps the most obvious approach to try to simulate the limit directly. We’ll calculate the slope at x for h in a series where h gets progressively smaller and we’ll loop until the precision gets so small that the resulting slope calculation starts to repeat because Oracle’s NUMBER data type can’t represent distinct values anymore. This will be as close to 0 as our limit can get.
WITH
FUNCTION f(x NUMBER)
RETURN NUMBER
IS
BEGIN
RETURN x ** 3 + x ** 2;
END;
params AS(SELECT 5 x, 1 h FROM DUAL),
limit_series (seq, x, slope, h)
AS
(SELECT 1 seq, x, (f(x+h) - f(x)) / h, h/10
FROM params
UNION ALL
SELECT seq + 1, x, (f(x+h) - f(x)) / h, h/10
FROM limit_series
WHERE slope != (f(x + h) - f(x)) / h)
SELECT *
FROM limit_series
ORDER BY seq;
SEQ X SLOPE H
--- - --------------------------- ----------------------
1 5 102 0.1
2 5 86.61 0.01
3 5 85.1601 0.001
4 5 85.016001 0.0001
5 5 85.00160001 0.00001
6 5 85.0001600001 0.000001
7 5 85.000016000001 0.0000001
8 5 85.00000160000001 0.00000001
9 5 85.0000001600000001 0.000000001
10 5 85.000000016000000001 0.0000000001
11 5 85.00000000160000000001 0.00000000001
12 5 85.0000000001600000000001 0.000000000001
13 5 85.000000000016000000000001 0.0000000000001
14 5 85.0000000000016 0.00000000000001
15 5 85.00000000000016 0.000000000000001
16 5 85.000000000000016 0.0000000000000001
17 5 85.0000000000000016 0.00000000000000001
18 5 85.00000000000000016 0.000000000000000001
19 5 85.000000000000000016 0.0000000000000000001
20 5 85 0.00000000000000000001
That turned out to be pretty easy and we can jump directly to the final answer with a small tweak to the final query sorting the values in reverse and taking the first row (seq=20)
SELECT slope
FROM limit_series
ORDER BY seq DESC
FETCH FIRST 1 ROWS ONLY;
SLOPE
-----
85
And we can verify the results because the derivative of is
and therefore the derivative value at x=5 is
or
or
.
Let’s try another function now cos(x) whose derivative is -sin(x).
WITH
FUNCTION f(x NUMBER)
RETURN NUMBER
IS
BEGIN
RETURN cos(x);
END;
params AS(SELECT 5 x, 1 h FROM DUAL),
limit_series (seq, x, slope, h)
AS
(SELECT 1 seq, x, (f(x+h) - f(x)) / h, h/10
FROM params
UNION ALL
SELECT seq + 1, x, (f(x+h) - f(x)) / h, h/10
FROM limit_series
WHERE slope != (f(x + h) - f(x)) / h)
SELECT slope
FROM limit_series
ORDER BY seq DESC
FETCH FIRST 1 ROWS ONLY;
SLOPE
-----
0
Oops, that’s not right! It should be something close to 0.9589. If we remove the ORDER BY and FETCH FIRST clauses we can see the series trend.
0.6765081011871397560790131264093670971281 0.943155572497542988539363814154243625703 0.95748999356370644806813409211376650739 0.9587822837615216602736007115817115509 0.958910089955670003198194216079624562 0.95892285633622909333678406976316905 0.9589241328318859165793975821436058 0.958924260480027597524728563518437 0.95892427324482752559495185125713 0.9589242745213073760017205615097 0.958924274648955359618394885793 0.95892427466172015796582229163 0.9589242746629966378004226305 0.95892427466312428578388118 0.95892427466313705058222794 0.9589242746631383270620536 0.958924274663138454709886 0.95892427466313846747505 0.9589242746631384687557 0.958924274663138469013 0.95892427466313846892 0.9589242746631384839 0.958924274663138464 0.95892427466313856 0.9589242746631501 0.9589242746631 0.9589242746636 0.9589242746726 0.958924274783 0.9589242752 0.9589242798 0.958924411 0.95892467 0.9589222 0.959018 0.95953 0.9642 0.929 1.54 0
We can see the same problem using the secant method, that is
WITH
FUNCTION f(x NUMBER)
RETURN NUMBER
IS
BEGIN
RETURN cos(x);
END;
params AS(SELECT 5 x, 1 h FROM DUAL),
limit_series (seq, x, slope, h)
AS
(SELECT 1 seq, x, (f(x+h) - f(x-h)) / (2*h), h/10
FROM params
UNION ALL
SELECT seq + 1, x, (f(x+h) - f(x-h)) / (2*h), h/10
FROM limit_series
WHERE slope != (f(x+h) - f(x-h)) / (2*h))
SELECT slope
FROM limit_series;
0.80690695375698896759241024051033739345
0.957326866452025794131232044035308361106
0.95890829267180424921144573688737137974
0.95892411484243401607217485403390024155
0.9589242730649313452536938533630892785
0.95892427464715639764884867536399627
0.95892427466297864818071055773554835
0.958924274663136870686029967592652
0.958924274663138452911083161770375
0.95892427466313846873333369371355
0.9589242746631384688915561989995
0.958924274663138468893138423115
0.95892427466313846889315424525
0.958924274663138468893154465
0.958924274663138468893154635
0.95892427466313846889315095
0.9589242746631384688930085
0.958924274663138468893035
0.9589242746631384688973
0.9589242746631384689925
0.95892427466313846866
0.95892427466313848135
0.9589242746631384385
0.958924274663138065
0.9589242746631426
0.9589242746630745
0.95892427466335
0.9589242746676
0.9589242747075
0.95892427444
0.9589242722
0.958924335
0.958923905
0.95891715
0.958992
0.959265
0.96155
0.9535
1.28
0
In both the tangent and secant estimates we can see they fairly quickly approached the correct answer but then began dwindling away until the rounding errors and precision limits caused them to fluctuate and then become 0. This is a known problem with an arithmetic approach in computation due to the limits of the numerical data types.
As h approaches 0 rounding errors evaluating the functions begin to creep in. Furthermore we know that eventually x+h will no longer be distinguishable from x. For the Oracle NUMBER type, that limit is 10-38. While the NUMBER type can represent values much smaller than that, it can’t reliably apply them to other values in arithmetic or function calls while maintaining mathematical integrity. Formally, this limit is known a machine epsilon. The name originating from the hardware limitations. The NUMBER type isn’t represented in the hosting hardware itself but the same principle still applies. There will always be some limit. Even if the NUMBER type, operators, and functions were able to hold and process a billion significant digits there would still be a limit. It would be a much smaller value but it would still exist.
Fortunately, if we know the epsilon value we can use it to calculate a viable minimum h. This will not only ameliorate the precision problems, it also means we don’t need to loop for repeating values. Once we know the target h-value, we can just calculate the slope at that point.
The target h is value is . This doesn’t work for x=0 and, if you have very large or very small values or many significant digits for x the epsilon root may be less useful as you will introduce a new rounding error in the multiplication; but this should work for most data ranges.
WITH
FUNCTION f(x NUMBER)
RETURN NUMBER
IS
BEGIN
RETURN COS(x);
END;
params AS(SELECT 5 x,
5 * SQRT(POWER(10, -38)) h
FROM DUAL)
SELECT (f(x + h) - f(x)) / h
FROM params;
SLOPE
-----------------------
0.958924274663138468809
Comparing that to the known derivative -sin(x), we can see the estimate is now quite close; only deviating after 19 digits.
SELECT -SIN(5) FROM DUAL; SLOPE ------------------------------------------ 0.9589242746631384688931544061559939733579
Going back to our initial polynomial function we can see this method is still valid.
WITH
FUNCTION f(x NUMBER)
RETURN NUMBER
IS
BEGIN
RETURN x ** 3 + x ** 2;
END;
params AS(SELECT 5 x,
5 * SQRT(POWER(10, -38)) h
FROM DUAL)
SELECT (f(x + h) - f(x)) / h slope
FROM params;
SLOPE
---------------------
85.000000000000000008
Earlier I mentioned using a secant as an alternate means of calculating the slope. You can use the same method here with the secant formula.
WITH
FUNCTION f(x NUMBER)
RETURN NUMBER
IS
BEGIN
RETURN x ** 3 + x ** 2;
END;
params AS(SELECT 5 x,
5 * SQRT(POWER(10, -38)) h
FROM DUAL)
SELECT (f(x + h) - f(x-h)) / (2*h) slope
FROM params;
SLOPE
-----
85
WITH
FUNCTION f(x NUMBER)
RETURN NUMBER
IS
BEGIN
RETURN cos(x);
END;
params AS(SELECT 5 x,
5 * SQRT(POWER(10, -38)) h
FROM DUAL)
SELECT (f(x + h) - f(x-h)) / (2*h) slope
FROM params;
SLOPE
------------------------
0.9589242746631384688821
And we get a little more precision using this method.
There are many other methods of calculating derivatives; hopefully these give you a little taste. I thought this was a fun exercise and I thank Connor for posting the original inspiration that got it started.