Using Dates for dates and the dangers of Implicit conversions
As mentioned in previous articles, a common mistake is choosing a data type other than a Date or Timestamp to hold time-based data. Numeric values like 20120424 or even worse 04242012. While both of these are human readable as yyyymmdd and mmddyyyy respectively; they have multiple issues. Neither one validates the values, for example: 20121978 fits the format but there is no 78th day of the 19th month in 2012. The first is at least sortable in a reliable way but the second is not. Perhaps the most perplexing is working out date math. How do you add add days to either of these while keeping track of different month lengths, rolling over from December to January of the next year and leap year rules?
Another variation that is somewhat more reasonable is using a numeric offset, such as seconds from the UNIX epoch. If you are storing values only used within the context of an external application then this may be appropriate. If you do need to process them within the database though, then you either lose all of the functionality of Oracle’s date functions or you have to add the overhead of conversion back and forth to real Date or Timestamp values.
Probably the biggest offender is the use of text strings for dates. Unfortunately, Oracle supports implicit conversion. Not using explicit date masks on your conversions can lead to ORA-1840, ORA-1841 and ORA-1858 errors, or even worse, your data might coincidentally fit an implicit mask and produce unexpected errors and data corruption. In the example below, the intent is to load data for April 8, 2020 11 am. What actually gets stored is March 20, 0008 11 am
It’s off by more than two-thousand years! This is clearly not the desired results.
SQL> CREATE TABLE t2 (col1 VARCHAR2(10), col2 VARCHAR2(60), col3 DATE); Table created. SQL> INSERT INTO t2 VALUES ('TEST1', 'This will be bad', '08-APR-20 11'); 1 row created. SQL> INSERT INTO t2 VALUES ('TEST2', 'This should be ok', SYSDATE); 1 row created. SQL> select * from t2; COL1 COL2 COL3 ---------- ------------------ ------------------- TEST1 This will be bad 0008-04-20 11:00:00 TEST2 This should be ok 2020-04-09 01:04:01
What is the reason for these results?
This particular database session had an NLS_DATE_FORMAT of ‘yyyy-mm-dd hh24:mi:ss’. If you don’t use explicit conversion formats in your code, then you need to take into account what the session settings might be and adjusting accordingly.
My preference when working with dates is to always use explicit conversions with the various TO_xxx functions or ANSI literals. Even if your code must support multiple date formats due to user preferences or internationalization, there is still little reason to use implicit. Using explicit conversion masks you can parameterize them as needed or even change the NLS formats as you go. One option to support multiple conversions is to write a wrapper function of your own and iterate through the formats you want to support in whatever precedence is appropriate. For example, you could try something like this:
CREATE OR REPLACE FUNCTION multi_to_date(p_str IN VARCHAR2) RETURN DATE IS TYPE format_types IS TABLE OF VARCHAR2(50); v_temp DATE; v_ok BOOLEAN := FALSE; v_formats format_types := format_types('DD/MM/YYYY', 'DD/MM/YYYY HH24:MI', ' Month dd, yyyy'); v_index INTEGER := 1; BEGIN WHILE v_index <= v_formats.COUNT AND NOT v_ok LOOP v_ok := FALSE; BEGIN v_temp := TO_DATE(p_str, v_formats(v_index)); v_ok := TRUE; -- Found a match, this string is OK! EXCEPTION WHEN OTHERS THEN v_ok := FALSE; END; v_index := v_index + 1; END LOOP; IF v_ok THEN RETURN v_temp; ELSE RETURN NULL; END IF; END;
Another option would be to allow passing in a collection of parameters. The version above returns NULL if no matching format is found but it could be modified to raise an exception or a default value easily enough. The point of this isn’t to be a final solution; but to illustrate a framework of building in some versatility while still allowing explicit conversions and help avoid the errors mentioned above.
A related problem with implicit conversions is it can fool people into believing the form you see when querying the data is also the internal storage format within the database. This misconception is the genesis of many of the questions I see online. Developers will assume the date value is stored a certain way and will then force conversions to and from other formats in order to derive some final result. Often this will take the form of something like this:
to_date(sysdate,'yyyy-mm-dd') -- don't do this!
The goal is to convert the Date into another Date with the time portion removed. That is not actually possible. A Date will always have a time component. However, depending on the method used to view the value, you may or may not see it.
The erroneous syntax above is a commonly propagated mistake because it will sometimes appear to work!
Depending on the NLS_DATE_FORMAT it will do an implicit conversion of the Date into a string, and then convert the string back into a date. The results will vary from success, to wrong results, to raised exceptions depending on the NLS settings, the actual Date value and the format masked used. These types of errors can be avoided simply by remembering to treat dates as dates. If you want to manipulate the date value, use date-based functions like TRUNC, NEXT_DAY, ADD_MONTHS, etc. Any of the TO_xxx functions mean you are changing a date “to” something else or converting something else “to” a date.
So, what should the developer do instead of the example above? If you simply wish to see a date value without its time values, then convert the date to text with the format you want. Using the same format as above, it would look like this:
to_char(sysdate,'yyyy-mm-dd')
Another option is to truncate (see below) the time portion to midnight (i.e. 12am or 00:00:00.) The zero values will still exist within the Date; but if all days are consistently using 0 for the time, then date math and date comparisons will work more consistently. That is, if you subtract two date values both truncated to midnight, the result will be an integer value representing the number of days between them. If you have Dates with varying time values then the subtraction will still produce mathematically correct results; but perhaps not the expected results. If you subtract one value with a time of 15:37:26 from a date value the following day at 12:47:58 you might expect to get 1 day because the year, month, day portion differs by a day, but the Date values in their entirety differ by slightly less than a day (0.8823148….)
Dangerous NLS Formats
As shown above, the use of NLS date and timestamp formats can produce odd results if your values and formats don’t coincide. This can be taken a step further to increase the likelihood of errors; or if there is malicious intent, they may even be used to create invalid data. Examine the sample below, note the value of sysdate is not returning anything that looks like a date, nor is it raising an exception.
SQL> select sysdate from dual; SYSDATE -------------------------- Do not use NLS_DATE_FORMAT
The reason for this is I was using standard SQL*Plus which is a text-based tool. Thus it has no means of displaying a Date value except by converting the value to text. This means my query resulted in an implicit conversion and hence used NLS_DATE_FORMAT to produce the result. The same thing would happen if I used an explicit conversion without specifying a format.
SQL> select to_char(sysdate), to_char(sysdate,'yyyy-mm-dd'), date '2020-03-01' from dual; TO_CHAR(SYSDATE) TO_CHAR(SY DATE'2020-03-01' -------------------------- ---------- -------------------------- Do not use NLS_DATE_FORMAT 2020-04-26 Do not use NLS_DATE_FORMAT
Note though, explicit conversion with a format did work. And specifying a date-literal value also produced the same effect. So this was not a trick where I replaced or modified the SYSDATE function. This is simply another example of what can happen when implicit conversions are used.
Many people forget the formatting of a date or timestamp can include fixed text. And, if your format doesn’t include any date/time masks, then the only content of an implicit conversion will be the fixed text, and that’s how these results were produced. Prior to executing the queries, the session’s NLS_DATE_FORMAT was set to the message shown above.
alter session set nls_date_format='"Do not use NLS_DATE_FORMAT"';
Some might dismiss this as cute or a mere prank; but the same principle can be used to generate erroneous values. Consider the following example where both SYSDATE and SYSTIMESTAMP are queried and allowed to convert without an explicit format. Note the values differ by over a 20 years!
SQL> select sysdate,systimestamp from dual; SYSDATE SYSTIMESTAMP ------------------- ----------------------------------- 1999-12-31 23:59:59 26-APR-20 12.33.37.324245 AM -04:00
The use of fixed text here is particularly insidious because the results still look like a “date” but if that text was then rendered into a report or sent into another process the end result is, at best, an error, at worst, accepted and used as legitimate data.
alter session set nls_date_format='"1999-12-31 23:59:59"';
This is not SQL injection in the classic sense of running unexpected code; but rather an injection of unexpected data; which can be just as damaging to a business if not caught. Preventing this type of vulnerability is simple. Always use an explicit mask when converting dates and text; or, if you must use implicit conversions then ensure you have control of the NLS formats before allowing any implicit conversion to occur.
But, speaking of SQL injection, using this same technique it is possible to alter queries with this technique. If you construct a query by string concatenation and embed an implicitly converted Date value, you will end up with whatever default format text the NLS settings have. If that text includes additional where clauses or function calls, then your dynamic sql will execute them just as if you typed in that code yourself. This security hole would be both a failure to assert data quality before processing as well as trusting default conversions.
Date and Timestamp Structure
Clearly, trusting date-to-string conversions is not the way to gain visibility to date structures. So, how does one determine the structure of a date type? When developers ask me, my initial response is usually something along the lines of “What would you do with that information if you had it?” Oracle doesn’t really provide mechanisms for manipulating the internal structure of a date type, so even if you know which bytes store which information, you have no means of utilizing it to create new functionality. Dates and Timestamps should be manipulated as distinct objects themselves, not as interpretations of other lower level values.
Having said that, for those wishing to pursue it purely for academic reasons, or investigating odd/erroneous values; you can use the DUMP function to see the byte-by-byte contents of a date structure.
SQL> SELECT TO_CHAR(event_date, 'syyyy-mm-dd hh24:mi:ss') event_date, 2 DUMP(event_date) structure, 3 event_name 4 FROM sample_dates 5 ORDER BY sample_dates.event_date; EVENT_DATE STRUCTURE EVENT_NAME -4712-01-01 00:00:00 Typ=12 Len=7: 53,88,1,1,1,1,1 Oracle Beginning of Time -0480-08-07 00:00:00 Typ=12 Len=7: 96,20,8,7,1,1,1 Battle of Thermopylae 1066-10-14 00:00:00 Typ=12 Len=7: 110,166,10,14,1,1,1 Battle of Hastings 1215-06-15 00:00:00 Typ=12 Len=7: 112,115,6,15,1,1,1 First Magna Carta 1969-07-20 20:17:40 Typ=12 Len=7: 119,169,7,20,21,18,41 Apollo 11 Moon Landing 9999-12-31 23:59:59 Typ=12 Len=7: 199,199,12,31,24,60,60 Oracle End of Time
The Date structure is divided with some elements as is and others with a delta applied. As you can see in the example above, the year is divided into centuries in the first byte and decade/years in second byte; each with a delta of +100.
1969: 119 = 19 + 100, 169 = 69 + 100 -480: 96 = -4 + 100, 20 = -80 + 100
Month and day of month are stored in the 3rd and 4th bytes respectively without modification. The hours, minutes and seconds each have a delta of +1 and occupy the 5th, 6th and 7th bytes respectively.
13:45:00 14=13+1, 46=45+1, 1=0+1 23:59:59 24=23+1, 60=59+1, 60=59+1
Timestamps have a similar structure with the last 4 bytes representing an integer number counting the nanoseconds. The number will always be a nanosecond count, even if the timestamp resolution is set to less than 9 digits, unless the count is 0. If the count is zero, then the timestamp length will be shortened from 11 bytes to 7 and the count is left off entirely. An example of this can be seen below with timestamp at beginning of the Oracle calendar.
SQL> SELECT TO_CHAR(event_timestamp, 'syyyy-mm-dd hh24:mi:ssxff') event_timestamp, 2 DUMP(event_timestamp) structure 3 FROM sample_timestamps 4 ORDER BY sample_timestamps.event_timestamp; EVENT_TIMESTAMP STRUCTURE -4712-01-01 00:00:00.000000000 Typ=180 Len=7: 53,88,1,1,1,1,1 -0480-08-07 00:00:00.712471688 Typ=180 Len=11: 96,20,8,7,1,1,1,42,119,116,136 1066-10-14 00:00:00.992473093 Typ=180 Len=11: 110,166,10,14,1,1,1,59,39,240,5 1215-06-15 00:00:00.534016038 Typ=180 Len=11: 112,115,6,15,1,1,1,31,212,112,38 1969-07-20 20:17:40.145092332 Typ=180 Len=11: 119,169,7,20,21,18,41,8,165,238,236 9999-12-31 23:59:59.999999999 Typ=180 Len=11: 199,199,12,31,24,60,60,59,154,201,255
TIMESTAMP WITH TIME ZONE values are 13 bytes long, even if the nanosecond counter is 0. Following the 11-byte form for the date and time, there are 2 additional bytes to store the time zone information. The numbers represent the actual time zone definition, not just an offset. This can be seen in the following example. All of the values have an offset numerically; but most of them are defined as UTC time zone. The last value, at the end of time, has an explicit hour/minute offset of +0:00.
SQL> SELECT TO_CHAR(event_ts_with_tz, 'syyyy-mm-dd hh24:mi:ss.ff tzr tzh:tzm') event_timestamp, 2 DUMP(event_ts_with_tz) structure 3 FROM sample_timestamps 4 ORDER BY sample_timestamps.event_ts_with_tz; EVENT_TIMESTAMP STRUCTURE -4712-01-01 00:00:00.000000000 UTC +00:00 Typ=181 Len=13: 53,88,1,1,1,1,1,0,0,0,0,208,4 -0480-08-07 00:00:00.712471688 UTC +00:00 Typ=181 Len=13: 96,20,8,7,1,1,1,42,119,116,136,208,4 1066-10-14 00:00:00.992473093 UTC +00:00 Typ=181 Len=13: 110,166,10,14,1,1,1,59,39,240,5,208,4 1215-06-15 00:00:00.534016038 UTC +00:00 Typ=181 Len=13: 112,115,6,15,1,1,1,31,212,112,38,208,4 1969-07-20 20:17:40.145092332 UTC +00:00 Typ=181 Len=13: 119,169,7,20,21,18,41,8,165,238,236,208,4 9999-12-31 23:59:59.999999999 +00:00 +00:00 Typ=181 Len=13: 199,199,12,31,24,60,60,59,154,201,255,20,60
If the values are TIMESTAMP WITH LOCAL TIME ZONE then the structure is the same as a regular TIMESTAMP; but the distinct data type allows for the value to be calculated to the user’s session settings. In the example below you can see the bytes are the same between the regular timestamp and the timestamp with local time zone, but viewing the value produces different times. There is a quirk of the Oracle calendar though at the beginning and end of time. Shifting time zones around a timestamp can produce a date/time combination that is outside the legal range of the Oracle calendar. When this happens you can get inconsistent results as shown below at the beginning of time. I explored this more fully in another article, here.
SQL> SELECT TO_CHAR(event_ts_with_tz, 'syyyy-mm-dd hh24:mi:ss.ff tzh:tzm') event_timestamptz, 2 TO_CHAR(event_ts_with_local_tz, 'syyyy-mm-dd hh24:mi:ss.ff') event_timestamp_local, 3 DUMP(event_timestamp) structure, 4 DUMP(event_ts_with_local_tz) structurelocal 5 FROM sample_timestamps 6 ORDER BY sample_timestamps.event_ts_with_tz; EVENT_TIMESTAMPTZ EVENT_TIMESTAMP_LOCAL STRUCTURE STRUCTURELOCAL -4712-01-01 00:00:00.000000000 +00:00 -4712-01-01 19:04:00.000000000 Typ=180 Len=7: 53,88,1,1,1,1,1 Typ=231 Len=7: 53,88,1,2,1,1,1 -0480-08-07 00:00:00.712471688 +00:00 -0480-08-06 19:04:00.712471688 Typ=180 Len=11: 96,20,8,7,1,1,1,42,119,116,136 Typ=231 Len=11: 96,20,8,7,1,1,1,42,119,116,136 1066-10-14 00:00:00.992473093 +00:00 1066-10-13 19:04:00.992473093 Typ=180 Len=11: 110,166,10,14,1,1,1,59,39,240,5 Typ=231 Len=11: 110,166,10,14,1,1,1,59,39,240,5 1215-06-15 00:00:00.534016038 +00:00 1215-06-14 19:04:00.534016038 Typ=180 Len=11: 112,115,6,15,1,1,1,31,212,112,38 Typ=231 Len=11: 112,115,6,15,1,1,1,31,212,112,38 1969-07-20 20:17:40.145092332 +00:00 1969-07-20 16:17:40.145092332 Typ=180 Len=11: 119,169,7,20,21,18,41,8,165,238,236 Typ=231 Len=11: 119,169,7,20,21,18,41,8,165,238,236 9999-12-31 23:59:59.999999999 +00:00 9999-12-31 18:59:59.999999999 Typ=180 Len=11: 199,199,12,31,24,60,60,59,154,201,255 Typ=231 Len=11: 199,199,12,31,24,60,60,59,154,201,255
While there isn’t an api for direct byte manipulation of a date/time value; there is little need to do so as there are many ways to create or modify a date or timestamp value. These will be explored further in other articles of this series.