12cR2 tightens up ORA-01841 for zero year ANSI dates, but not for Oracle SQL syntax

In moving some more code from an 11gR2 database to a 12cR2 database, I found another change where a piece of code that works in 11gR2 doesn’t compile in 12cR2.

In this instance a view was being created with a projected date column which used the ANSI DATE syntax. Here is a simplified test script:

CREATE OR REPLACE VIEW test1 AS
SELECT date '0000-01-01' date_col
FROM dual
/
DROP VIEW test
/

CREATE OR REPLACE VIEW test2 AS
SELECT TO_DATE('0000-01-01','YYYY-MM-DD') date_col
FROM dual
/

Running this on 11gR2 gives:

SQL>CREATE OR REPLACE VIEW test1 AS
  2  SELECT date '0000-01-01' date_col
  3  FROM   dual
  4  /

View created.

SQL>CREATE OR REPLACE VIEW test2 AS
  2  SELECT TO_DATE('0000-01-01','YYYY-MM-DD') date_col
  3  FROM   dual
  4  /

View created.

Now running this on 12cR2 gives:

SQL> CREATE OR REPLACE VIEW test1 AS
  2  SELECT date '0000-01-01' date_col
  3  FROM   dual
  4  /
SELECT date '0000-01-01' date_col
            *
ERROR at line 2:
ORA-01841: (full) year must be between -4713 and +9999, and not be 0


SQL> CREATE OR REPLACE VIEW test2 AS
  2  SELECT TO_DATE('0000-01-01','YYYY-MM-DD') date_col
  3  FROM   dual
  4  /

View created.

The date is zero and thus the error message is correct in 12cR2 for the ANSI DATE syntax.

ORA-54002 when trying to create Virtual Column using REGEXP_REPLACE on Oracle 12cR2

I encountered an issue today trying to create a table in an Oracle 12cR2 database, the DDL for which, I extracted from an Oracle 11gR2 database. The error returned when trying to create the table was:

ORA-54002: only pure functions can be specified in a virtual column expression

The definition of the table included a Virtual Column which used a REGEXP_REPLACE call to derive a value from another column on the table.

Here is a simplified test case illustrating the scenario (Thanks Tim for the REGEXP_REPLACE example code):

select * from v$version
/
create table test_ora54002_12c(
 col1 VARCHAR2(20 CHAR) NOT NULL
 ,virtual_column1 VARCHAR2(4000 CHAR) GENERATED ALWAYS AS(REGEXP_REPLACE(col1, '([A-Z])', ' \1', 2)) VIRTUAL VISIBLE
)
/
drop table test_ora54002_12c purge
/

Running this on 11gR2 gives:

SQL> select * from v$version
 2 /

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
PL/SQL Release 11.2.0.4.0 - Production
CORE 11.2.0.4.0 Production
TNS for Linux: Version 11.2.0.4.0 - Production
NLSRTL Version 11.2.0.4.0 - Production

5 rows selected.

Elapsed: 00:00:00.40
SQL> create table test_ora54002_12c(
 2 col1 VARCHAR2(20 CHAR) NOT NULL
 3 ,virtual_column1 VARCHAR2(4000 CHAR) GENERATED ALWAYS AS(REGEXP_REPLACE(col1, '([A-Z])', ' \1', 2)) VIRTUAL VISIBLE
 4 )
 5 /

Table created.

Elapsed: 00:00:00.24
SQL> drop table test_ora54002_12c purge
 2 /

Table dropped.

Running this on 12cR2 gives:

SQL> select * from v$version
/
 2
BANNER CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production 0
PL/SQL Release 12.2.0.1.0 - Production 0
CORE 12.2.0.1.0 Production 0
TNS for Linux: Version 12.2.0.1.0 - Production 0
NLSRTL Version 12.2.0.1.0 - Production 0

SQL> create table test_ora54002_12c(
 col1 VARCHAR2(20 CHAR) NOT NULL
 ,virtual_column1 VARCHAR2(4000 CHAR) GENERATED ALWAYS AS(REGEXP_REPLACE(col1, '([A-Z])', ' \1', 2)) VIRTUAL VISIBLE
)
/
 2 3 4 5 ,virtual_column1 VARCHAR2(4000 CHAR) GENERATED ALWAYS AS(REGEXP_REPLACE(col1, '([A-Z])', ' \1', 2)) VIRTUAL VISIBLE
 *
ERROR at line 3:
ORA-54002: only pure functions can be specified in a virtual column expression


SQL> drop table test_ora54002_12c purge
/
 2 drop table test_ora54002_12c purge
 *
ERROR at line 1:
ORA-00942: table or view does not exist

As you can see, 12cR2 gives the ORA-54002 error.

Looking on MOS, highlights this article, which suggests that you shouldn’t have been able to do this in 11gR2, i.e. it was a bug and that 12cR2 has fixed this bug and thus you can no longer create such a virtual column (the article refers to functional index and check constraint use cases as well).

In my case, I was able to rewrite the virtual column to use simple string functions such as SUBSTR, TRANSLATE and INSTR to achieve what I wanted and the virtual column was allowed to be created with these – problem solved – a shame really as the REGEXP_REPLACE approach was far neater.

Oracle Virtual Columns – Can’t use plain column or duplicate expressions

I had a scenario today where I was loading a table and a particular column is known by multiple names in different source systems and thus to different people. In order to make everyone happy on this occasion, I wondered if I could create a normal column for one of the multiple names and then use virtual columns pointing at the normal column, for the other names.

I’m aware there are several ways of skinning this cat and that virtual columns was probably not the best choice in the first place, but I was just playing with an idea and it didn’t quite end up where I thought…so it was interesting in that respect.

The test code:


CREATE TABLE test
  (
     column1    INTEGER NOT NULL 
    ,virtual_column1 INTEGER AS ( column1 ) VIRTUAL NOT NULL 
  )
/
CREATE TABLE test
  (
     column1    INTEGER NOT NULL 
    ,virtual_column1 INTEGER AS ( column1 + 0 ) VIRTUAL NOT NULL 
  )
/
DROP TABLE test PURGE
/
CREATE TABLE test
  (
     column1    INTEGER NOT NULL 
    ,virtual_column1 INTEGER AS ( column1 + 0 ) VIRTUAL NOT NULL 
    ,virtual_column2 INTEGER AS ( column1 + 0 ) VIRTUAL NOT NULL 
  )
/
CREATE TABLE test
  (
     column1    INTEGER NOT NULL 
    ,virtual_column1 INTEGER AS ( column1 + 0 ) VIRTUAL NOT NULL 
    ,virtual_column2 INTEGER AS ( column1 + 1 - 1 ) VIRTUAL NOT NULL 
  )
/
DROP TABLE test PURGE
/

And the results:


    ,virtual_column1 INTEGER AS ( column1 ) VIRTUAL NOT NULL
                                  *
ERROR at line 4:
ORA-54016: Invalid column expression was specified



Table created.


Table dropped.

    ,virtual_column2 INTEGER AS ( column1 + 0 ) VIRTUAL NOT NULL
     *
ERROR at line 5:
ORA-54015: Duplicate column expression was specified



Table created.


Table dropped.

I expected it to just work, but I clearly ran in to two problems which scuppered my idea. Firstly, the virtual column can’t simply refer to a normal column without any changes to it, otherwise it fails with ORA-54016. The error isn’t particularly helpful, but eventually I worked out that it was because the column is simply a mapping to a non virtual column. Working around that by adding zero to the numeric column gets it to work, but it’s an ugly hack.

In my scenario there are three different names for this column, depending on the users involved, which then leads on to the next issue, which is that I’d then need two virtual columns pointing at the same source column. Unfortunately if I use the same hack twice, it fails with ORA-54015, because you can’t have two virtual columns with exactly the same expression! A slight variant to the hack and it works, but it’s getting uglier and uglier!

Time to seek out plan B!

About Oracle: DBA_DEPENDENCY_COLUMNS

A colleague asked if there was a way to do column level dependency tracking recently. He wanted to know for a given view, which tables and the columns on those tables, it was dependent upon, without, of course, reading through the code.

I was vaguely aware that since 11gR1 Oracle has been tracking fine grained (column) dependencies, but couldn’t find a way of seeing the details stored, until I found this interesting article from Rob Van Wijk:

About Oracle: DBA_DEPENDENCY_COLUMNS

I passed the details on to our DBA who implemented it and it seemed to work, for us. Your mileage may vary, of course.

Some comments on Rob’s blog post, bearing in mind, of course, that it was written in 2008 and refers to 11gR1:

  1. D_ATTRS contains values other than “000100000A”. I’ve observed this in a basic 12c install and a production 11gR2 install
  2. D_ATTRS is commented in $ORACLE_HOME/rdbms/admin/dcore.bsq as “/* Finer grain attr. numbers if finer grained */”
  3. D_REASON is commented in $ORACLE_HOME/rdbms/admin/dcore.bsq as “/* Reason mask of attrs causing invalidation */”.  On my basic 12c installation, all rows contain NULL for this value although on a production 11gR2 database I observed a handful of rows with values in this column.
  4. Noted from the comments against Rob’s article is the opportunity to vote for this feature on OTN here

 

KEEP DENSE_RANK versus ROW_NUMBER – further details

I found this nice post from Laurent Schneider the other day and wanted  to comment, but my comments were a bit more of a digression and discussion, so I’ve blogged it and put a link on the comments of the post by Laurent.

I’d always used the ROW_NUMBER method myself until I read this and then figured I’d try the KEEP DENSE_RANK method, which works, as Laurent describes. One thing that didn’t sit well with me in the post from Laurent was that he said “the second one should be more performant” – I prefer hard facts, so I decided to test it a bit and my results are below.

In the simple example that Laurent gave, it’s difficult to tell which is quickest, since the table in question only has a handful of rows and therefore any benchmarking is more susceptible to other influences, clouding the results. I figured I’d build a larger table and try it on that.

Before I did that though, I did get the plans from the two statements Laurent ran on the EMP table and both show the same resource costings:

Firstly, for the ROW_NUMBER method:

select ename
,      deptno
,      sal
from   (select ename
 ,      deptno
 ,      sal
 ,      row_number() over (partition by deptno order by sal desc,empno) r
 from   emp
 )
where  r=1;

Plan hash value: 3291446077                                                                                                             

---------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |    14 |   644 |     4  (25)| 00:00:01 |
|*  1 |  VIEW                    |      |    14 |   644 |     4  (25)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK|      |    14 |   644 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL     | EMP  |    14 |   644 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

 1 - filter("R"=1)
 2 - filter(ROW_NUMBER() OVER ( PARTITION BY "DEPTNO" ORDER BY
 INTERNAL_FUNCTION("SAL") DESC ,"EMPNO")<=1)

Note
-----
 - dynamic sampling used for this statement

Now, the KEEP DENSE_RANK method:

select max(ename) keep (dense_rank first order by sal desc,empno) ename
,      deptno
,      max(sal) sal
from   emp 
group by deptno;

Plan hash value: 15469362

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    14 |   644 |     4  (25)| 00:00:01 |
|   1 |  SORT GROUP BY     |      |    14 |   644 |     4  (25)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| EMP  |    14 |   644 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement

So, the plans are very similar, especially in terms of having the same resource usage…which means they should be similar in terms of performance…running them, as Laurent did, shows around 1s response times, which, as I say, doesn’t conclusively prove which method is quickest.

OK, on to a bigger example then…

I basically created a similar table to EMP, called JEFF_EMP and added a few more columns (for later) and then put ten million rows in it, taking around 1.3GB on my system…plenty to churn through.

DROP TABLE jeff_emp PURGE
/
CREATE TABLE jeff_emp(deptno     NUMBER
                     ,ename      VARCHAR2(100)
                     ,first_name VARCHAR2(50)
                     ,initials   VARCHAR2(30)
                     ,surname    VARCHAR2(50)
                     ,sal        NUMBER
                     ,empno      NUMBER
                     )
/
INSERT INTO jeff_emp(deptno,ename,first_name,initials,surname,sal,empno)
SELECT (MOD(ROWNUM,3) + 1) * 10
,      'FIRSTNAME_'||TO_CHAR(ROWNUM)||'_INITIALS_'||TO_CHAR(ROWNUM)||'_SURNAME_'||TO_CHAR(ROWNUM)
,      'FIRSTNAME_'||TO_CHAR(ROWNUM)
,      'INITIALS_'||TO_CHAR(ROWNUM)
,      'SURNAME_'||TO_CHAR(ROWNUM)
,      ROWNUM * 100
,      ROWNUM
FROM   (SELECT LEVEL l FROM dual CONNECT BY LEVEL < 10000001) ORDER BY l / COMMIT / EXEC dbms_stats.gather_table_stats(ownname => USER, tabname => 'JEFF_EMP',estimate_percent=>10);

Now, here is the plan for the ROW_NUMBER method:

---------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |    10M|   868M|       |   204K  (1)| 00:40:49 |
|*  1 |  VIEW                    |          |    10M|   868M|       |   204K  (1)| 00:40:49 |
|*  2 |   WINDOW SORT PUSHED RANK|          |    10M|   629M|  1533M|   204K  (1)| 00:40:49 |
|   3 |    TABLE ACCESS FULL     | JEFF_EMP |    10M|   629M|       | 46605   (1)| 00:09:20 |
---------------------------------------------------------------------------------------------

…and the results:

ENAME                                                                                                    DEPTNO        SAL
---------------------------------------------------------------------------------------------------- ---------- ----------
FIRSTNAME_9999999_INITIALS_9999999_SURNAME_9999999                                                           10  999999900
FIRSTNAME_10000000_INITIALS_10000000_SURNAME_10000000                                                        20 1000000000
FIRSTNAME_9999998_INITIALS_9999998_SURNAME_9999998                                                           30  999999800

Elapsed: 00:00:24.47

…and the KEEP DENSE_RANK method plan:

-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     3 |   198 | 47109   (2)| 00:09:26 |
|   1 |  SORT GROUP BY     |          |     3 |   198 | 47109   (2)| 00:09:26 |
|   2 |   TABLE ACCESS FULL| JEFF_EMP |    10M|   629M| 46605   (1)| 00:09:20 |
-------------------------------------------------------------------------------

…and it’s results:

ENAME                                                                                                    DEPTNO        SAL
---------------------------------------------------------------------------------------------------- ---------- ----------
FIRSTNAME_9999999_INITIALS_9999999_SURNAME_9999999                                                           10  999999900
FIRSTNAME_10000000_INITIALS_10000000_SURNAME_10000000                                                        20 1000000000
FIRSTNAME_9999998_INITIALS_9999998_SURNAME_9999998                                                           30  999999800

Elapsed: 00:00:07.76

So, reasonably clear results, indicating that the KEEP DENSE_RANK is about a third of the time to run, compared to the ROW_NUMBER method. You can also see from the plans that the ROW_NUMBER method involves use of TEMP, whereas the KEEP DENSE_RANK doesn’t, hence the slowdown.

So, Laurent was correct in his assertion that it should be more performant…but it’s nice to see the results based on a more meaningful set of data.

Now, there was one other thing that concerned me, and that was whether if you added more columns into the SQL, would it change the performance fo either method to any significant degree, so I started using the extra name columns like this:

SELECT ename
,      first_name
,      initials
,      surname
,      deptno
,      sal
FROM   (SELECT ename
        ,      first_name
        ,      initials
        ,      surname
        ,      deptno
        ,      sal
        ,      ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC,empno) r 
        FROM   jeff_emp
       )
WHERE  r = 1
/

…which has a plan of:

---------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |    10M|  1546M|       |   307K  (1)| 01:01:36 |
|*  1 |  VIEW                    |          |    10M|  1546M|       |   307K  (1)| 01:01:36 |
|*  2 |   WINDOW SORT PUSHED RANK|          |    10M|  1107M|  2606M|   307K  (1)| 01:01:36 |
|   3 |    TABLE ACCESS FULL     | JEFF_EMP |    10M|  1107M|       | 46605   (1)| 00:09:20 |
---------------------------------------------------------------------------------------------

…and results:

ENAME                                                                                                FIRST_NAME                                         INITIALS                       SURNAME                      DEPTNO         SAL
---------------------------------------------------------------------------------------------------- -------------------------------------------------- ------------------------------ -------------------------------------------------- ---------- ----------
FIRSTNAME_9999999_INITIALS_9999999_SURNAME_9999999                                                   FIRSTNAME_9999999                                  INITIALS_9999999               SURNAME_9999999          10  999999900
FIRSTNAME_10000000_INITIALS_10000000_SURNAME_10000000                                                FIRSTNAME_10000000                                 INITIALS_10000000              SURNAME_10000000         20 1000000000
FIRSTNAME_9999998_INITIALS_9999998_SURNAME_9999998                                                   FIRSTNAME_9999998                                  INITIALS_9999998               SURNAME_9999998          30  999999800

Elapsed: 00:00:25.76

For the KEEP DENSE_RANK I get:

SELECT MAX(ename) KEEP (DENSE_RANK FIRST ORDER BY sal DESC,empno) ename
,      MAX(first_name) KEEP (DENSE_RANK FIRST ORDER BY sal DESC,empno) first_name
,      MAX(initials) KEEP (DENSE_RANK FIRST ORDER BY sal DESC,empno) initials
,      MAX(surname) KEEP (DENSE_RANK FIRST ORDER BY sal DESC,empno) surname
,      deptno
,      MAX(sal) sal
FROM   jeff_emp 
GROUP BY deptno
/

Which has the following plan:

-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     3 |   348 | 47109   (2)| 00:09:26 |
|   1 |  SORT GROUP BY     |          |     3 |   348 | 47109   (2)| 00:09:26 |
|   2 |   TABLE ACCESS FULL| JEFF_EMP |    10M|  1107M| 46605   (1)| 00:09:20 |
-------------------------------------------------------------------------------

…and results:

ENAME                                                                                                FIRST_NAME                                         INITIALS                       SURNAME                      DEPTNO         SAL
---------------------------------------------------------------------------------------------------- -------------------------------------------------- ------------------------------ -------------------------------------------------- ---------- ----------
FIRSTNAME_9999999_INITIALS_9999999_SURNAME_9999999                                                   FIRSTNAME_9999999                                  INITIALS_9999999               SURNAME_9999999          10  999999900
FIRSTNAME_10000000_INITIALS_10000000_SURNAME_10000000                                                FIRSTNAME_10000000                                 INITIALS_10000000              SURNAME_10000000         20 1000000000
FIRSTNAME_9999998_INITIALS_9999998_SURNAME_9999998                                                   FIRSTNAME_9999998                                  INITIALS_9999998               SURNAME_9999998          30  999999800

Elapsed: 00:00:14.56

So, the differential in performance has reduced significantly, with the KEEP DENSE_RANK around double it’s original time, whilst the ROW_NUMBER method has only increased marginally. I’ve not tested with adding additional columns, but I’m guessing (I know…I could and should test it!) it will get worse, to the extent that, eventually, the KEEP DENSE_RANK will become the worse performer. If that’s the case, then essentially, these two methods have different scalability dynamics and one should bear this in mind when considering which to choose, depending on how many DENSE_RANK’d columns you’d need to deliver your results.

Hope this helps.

RANK v DENSE_RANK v ROW_NUMBER

I was asked a question by one of my users about the difference between the RANK and ROW_NUMBER analytics yesterday, so here is a post on it…

RANK, ROW_NUMBER and also DENSE_RANK are very useful for taking a set of rows and ordering them in a defined manner, whilst giving each row a “position value”. They differ based on the approach taken to define the value of their position in the set of output rows. In some circumstances, they may all give the same value however, dependent on the data, they may differ.

An example based on the SCOTT.EMP table, helps to illustrate…

SELECT empno
,      sal
,      RANK() OVER(ORDER BY sal) rank_position
,      DENSE_RANK() OVER(ORDER BY sal) dense_rank_position
,      ROW_NUMBER() OVER(ORDER BY sal) row_number_position
FROM   emp
/

which returns, on my 11gR1 database…

EMPNO        SAL RANK_POSITION DENSE_RANK_POSITION ROW_NUMBER_POSITION
---------- ---------- ------------- ------------------- -------------------
7369        800             1                   1                   1
7900        950             2                   2                   2
7876       1100             3                   3                   3
7521       1250             4                   4                   4
7654       1250             4                   4                   5
7934       1300             6                   5                   6
7844       1500             7                   6                   7
7499       1600             8                   7                   8
7782       2450             9                   8                   9
7698       2850            10                   9                  10
7566       2975            11                  10                  11
7788       3000            12                  11                  12
7902       3000            12                  11                  13
7839       5000            14                  12                  14
14 rows selected.

Notice that RANK has given the two employees with SAL = 1250, the same position value of 4 and the two employees with SAL=3000, the same position value of 12. Notice also that RANK skips position values 5 and 13 as it has two entries for 4 and 12 respectively. RANK uses all numbers between 1 and 14, except 5 and 13. RANK has both repeats and gaps in it’s ordering.

DENSE_RANK is similar to RANK, in that it gives the two employees with SAL=1250, the same position value of 4, but then it does not skip over position value 5 – it simply carries on at position 5 for the next values. DENSE_RANK uses, for the position values, all numbers between 1 and 12, without leaving any out, and using 4 and 11 twice. DENSE_RANK has no gaps in it’s ordering, only repeats.

ROW_NUMBER gives each row a unique position value and consequently uses all the numbers between 1 and 14. ROW_NUMBER has no gaps or repeats in it’s ordering. Note that the position value on ROW_NUMBER is not deterministic, since the ORDER BY clause only has SAL in it. If you want to ensure the order is the same each time, you need to add further columns to the ORDER BY clause.

No pruning for MIN/MAX of partition key column

Recently, I wanted to work out the maximum value of a column on a partitioned table. The column I wanted the maximum value for, happened to be the (single and only) partition key column. The table in question was range partitioned on this single key column, into monthly partitions for 2009, with data in all the partitions behind the current date, i.e. January through mid June were populated. There were no indexes on the table.

NOTE – I tried this on 10.2.04 (AIX) and 11.1.0 (Fedora 11) – the example below is from 11.1.0.

I’ll recreate the scenario here:

CREATE TABLESPACE tsp1
datafile '/u01/app/oracle/oradata/T111/tsp1.dbf' size 100M 
autoextend off extent management local  uniform size 1m segment space management auto online
/
CREATE TABLESPACE tsp2
datafile '/u01/app/oracle/oradata/T111/tsp2.dbf' size 100M 
autoextend off extent management local  uniform size 1m segment space management auto online
/

DROP TABLE test PURGE
/
CREATE TABLE test(col_date_part_key DATE NOT NULL
,col2 VARCHAR2(2000) NOT NULL
)
PARTITION BY RANGE(col_date_part_key)
(PARTITION month_01 VALUES LESS THAN (TO_DATE(’01-FEB-2009′,’DD-MON-YYYY’)) TABLESPACE tsp1
,PARTITION month_02 VALUES LESS THAN (TO_DATE(’01-MAR-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_03 VALUES LESS THAN (TO_DATE(’01-APR-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_04 VALUES LESS THAN (TO_DATE(’01-MAY-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_05 VALUES LESS THAN (TO_DATE(’01-JUN-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_06 VALUES LESS THAN (TO_DATE(’01-JUL-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_07 VALUES LESS THAN (TO_DATE(’01-AUG-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_08 VALUES LESS THAN (TO_DATE(’01-SEP-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_09 VALUES LESS THAN (TO_DATE(’01-OCT-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_10 VALUES LESS THAN (TO_DATE(’01-NOV-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_11 VALUES LESS THAN (TO_DATE(’01-DEC-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_12 VALUES LESS THAN (TO_DATE(’01-JAN-2010′,’DD-MON-YYYY’)) TABLESPACE tsp2
)
/
REM Insert rows, but only up to 14-JUN-2009
INSERT INTO test(col_date_part_key,col2)
SELECT TO_DATE(’31-DEC-2008′,’DD-MON-YYYY’) + l
, LPAD(‘X’,2000,’X’)
FROM (SELECT level l FROM dual CONNECT BY level < 166)
/
COMMIT
/
SELECT COUNT(*)
FROM test
/
SELECT MIN(col_date_part_key) min_date
, MAX(col_date_part_key) max_date
FROM test
/

This runs and gives the following output:

DROP TABLE test PURGE                                               
           *                                                        
ERROR at line 1:                                                    
ORA-00942: table or view does not exist

DROP TABLESPACE tsp1 INCLUDING CONTENTS
*
ERROR at line 1:
ORA-00959: tablespace ‘TSP1’ does not exist

DROP TABLESPACE tsp2 INCLUDING CONTENTS
*
ERROR at line 1:
ORA-00959: tablespace ‘TSP2’ does not exist

Tablespace created.

Tablespace created.

Table created.

165 rows created.

Commit complete.

COUNT(*)
———-
165

MIN_DATE MAX_DATE
——— ———
01-JAN-09 14-JUN-09

Now, lets see what the plan looks like from AUTOTRACE when we run the following query to get the maximum value of COL_DATE_PART_KEY:

SQL> SET AUTOTRACE ON
SQL> SELECT MAX(col_date_part_key) min_date
  2  FROM   test                           
  3  /

MIN_DATE
———
14-JUN-09

Execution Plan
———————————————————-
Plan hash value: 784602781

———————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
———————————————————————————————
| 0 | SELECT STATEMENT | | 1 | 9 | 99 (0)| 00:00:02 | | |
| 1 | SORT AGGREGATE | | 1 | 9 | | | | |
| 2 | PARTITION RANGE ALL| | 132 | 1188 | 99 (0)| 00:00:02 | 1 | 12 |
| 3 | TABLE ACCESS FULL | TEST | 132 | 1188 | 99 (0)| 00:00:02 | 1 | 12 |
———————————————————————————————

Note
—–
– dynamic sampling used for this statement

Statistics
———————————————————-
0 recursive calls
0 db block gets
320 consistent gets
51 physical reads
0 redo size
527 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> SET AUTOTRACE OFF

It shows a full scan of all twelve partitions. I figured that the the plan for such a query would show a full table scan, of all partitions for that table – because, in theory, if all but the first partition were empty, then the whole table would have to be scanned to answer the query – and Oracle wouldn’t know at plan creation time, whether the data met this case, so it would have to do the full table scan to ensure the correct result.

What I thought might happen though, is that in executing the query, it would be able to short circuit things, by working through the partitions in order, from latest to earliest, and finding the first, non null, value. Once it found the first, non null, value, it would know not to continue looking in the earlier partitions, since the value of COL_DATE_PART_KEY couldn’t possibly be greater than the non null value already identified.

It doesn’t appear to have this capability, which we can check by taking one of the partitions offline and then rerunning the query, whereupon it complains that not all the data is present…

SQL> ALTER TABLESPACE tsp1 OFFLINE;

Tablespace altered.

SQL> SET AUTOTRACE ON
SQL> SELECT MAX(col_date_part_key) min_date
2 FROM test
3 /
SELECT MAX(col_date_part_key) min_date
*
ERROR at line 1:
ORA-00376: file 6 cannot be read at this time
ORA-01110: data file 6: ‘/u01/app/oracle/oradata/T111/tsp1.dbf’

SQL> SET AUTOTRACE OFF

So, even though we know we could actually answer this question accurately, Oracle can’t do it as it wants to scan, unnecessarily, the whole table.

I did find a thread which somebody had asked about this on OTN, but all the responses were about workarounds, rather than explaining why this happens (bug/feature) or how it can be made to work in the way I, or the poster of that thread, think it, perhaps, should.

Can anyone else shed any light on this? If it’s a feature, then it seems like something that could be easily coded more efficiently by Oracle. The same issue would affect both MIN and MAX since both could be
approached in the same manner.

Problem with _gby_hash_aggregation_enabled parameter

Here’s a tale about an Oracle initialisation parameter…and a lesson we should all take note of…

For three days, my colleagues in the support team on one of the warehouses I’m involved in, were struggling with a piece of code which was exhausting the available temp space and after trying everything they could think of, they asked me to take a look. I must admit I was a little baffled at first because the piece of code in question had been happily running for some time now and every time I’d run it, I’d never noticed that TEMP was anywhere near being exhausted so, whilst I could tell the process had some kind of problem, I was in the dark as to exactly what that problem was.

After trying to break down the large query into several smaller steps, I realised that it was an early step in the query that was exhibiting the problem of temp exhaustion – the step being a pivot of around 300 million measure rows into a pivot target table.

This process runs monthly and it had run without issue on all of the previous 20 occurrences and for roughly the same, if not more rows to pivot…so it didn’t appear to be the volume that was causing the problem.

There had been no changes in the code itself for many months and the current version of the code had been run successfully in previous months, so it didn’t appear to be a code fault.

I obtained the actual execution path for the statement whilst it was running and it looked reasonable, although looking at it, triggered a thought in my mind…what if something had changed in the database configuration?

Why would I get that thought from looking at the execution path?

Well, a while back, we had received some advice that a line in our init.ora as follows, should be changed to set the feature to TRUE instead of FALSE, so that the feature became active:

_gby_hash_aggregation_enabled = FALSE

This results in a line in the plan that reads:

HASH GROUP BY

instead of

SORT GROUP BY

The parameter was set to FALSE due to a known bug and the issues we’d seen with it, however the recent advice we’d received, indicated that the bug had been resolved at the version level we were on and that by enabling the feature – which enables GROUP BY and Aggregation using a hash scheme – we’d gain a performance boost for certain queries.

So, the DBA team researched the advice and it appeared to be the case, that the bug (4604970) which led to the disabling of the feature was fixed at our version level (10.2.0.3 on HP-UX). We duly turned on the feature in a pre production environment and ran it for a while without noticing any issues. We then enabled it in production and again, for a while, we’ve not noticed any issues…until now.

After a check back through the logs, it appeared that since the parameter was first enabled, the queries which were now failing, had not been run at all…they had only run prior to the parameter change…so with my suspicions aroused further, I disabled the feature at the session level and reran the process. It completed in a normal time frame and used a small amount of TEMP – hooray!

So, now we have to go back to support to try and understand if the original bug is not quite fixed or whether this is a different scenario…in any event, we’re going to disable the feature for now, even though we’re only getting problems with the feature on 2 processes out of perhaps thousands.

So, what’s the lesson to learn?

Well, quite simply, that you need to have a thorough audit of what configuration changes you’ve made together with a good audit of the processes you’ve run so that you can work out what has changed since the last time you successfully ran a process. This gives you a fighting chance of spotting things like the above.

Tracking TEMP usage throughout the day

I really wanted to do this via AWR, but I’ve not been able to find out if this kind of information is stored and, if it is, how I’d get access to it…maybe someone else knows…hint hint?!

I have various queries I run whilst I’m online and processes are running, but what I really wanted, was to know the usage profile throughout the day…so, given that I couldn’t find an AWR way of tracking our use of TEMP on a database, I figured I’d use a more klunky method…

DROP TABLE mgmt_t_temp_use_history PURGE
/

CREATE TABLE temp_use_history(snap_date      DATE         NOT NULL
                             ,sid            NUMBER       NOT NULL
                             ,segtype        VARCHAR2(9)  NOT NULL
                             ,qcsid          NUMBER       NULL
                             ,username       VARCHAR2(30) NULL
                             ,osuser         VARCHAR2(30) NULL
                             ,contents       VARCHAR2(9)  NULL
                             ,sqlhash        NUMBER       NULL
                             ,sql_id         VARCHAR2(13) NULL
                             ,blocks         NUMBER       NULL
                             )
PCTFREE 0
COMPRESS
NOLOGGING
/
CREATE UNIQUE INDEX tuh_pk ON
temp_use_history(snap_date,sid,segtype)
PCTFREE 0
COMPRESS
NOLOGGING
/
ALTER TABLE temp_use_history ADD CONSTRAINT tuh_pk PRIMARY
KEY(snap_date,sid,segtype)
USING INDEX
/

DECLARE
  l_program_action VARCHAR2(2000);
  l_27477 EXCEPTION;
  PRAGMA EXCEPTION_INIT(l_27477,-27477); BEGIN
  BEGIN
    DBMS_SCHEDULER.CREATE_SCHEDULE(
      schedule_name   => 'MINUTELY_5M'
     ,start_date      => SYSDATE
     ,repeat_interval => 'FREQ=MINUTELY;INTERVAL=5'
     ,comments        =>'Daily schedule to run a job every five minutes.'
                                  );
  EXCEPTION
    WHEN l_27477 THEN
      NULL;  -- Ignore if the schedule exists
  END;

  l_program_action :=                   'DECLARE';
  l_program_action := l_program_action||'  l_date DATE := SYSDATE; ';
  l_program_action := l_program_action||'BEGIN';
  l_program_action := l_program_action||'  INSERT /*+ APPEND */ INTO temp_use_history(snap_date,sid,qcsid,username,osuser,contents,seg
type,sqlhash,sql_id,blocks)';
  l_program_action := l_program_action||'  SELECT l_date,s.sid,ps.qcsid,s.username,s.osuser,su.contents,su.segtype,su.sqlh
ash,su.sql_id,sum(su.blocks)';
  l_program_action := l_program_action||'  FROM   v$sort_usage su';
  l_program_action := l_program_action||'  ,      v$session s';
  l_program_action := l_program_action||'  ,      v$px_session ps';
  l_program_action := l_program_action||'  WHERE  s.sid=ps.sid(+)';
  l_program_action := l_program_action||'  AND    s.saddr =
su.session_addr';
  l_program_action := l_program_action||'  AND    s.serial# =
su.session_num';
  l_program_action := l_program_action||'  GROUP BY s.sid,ps.qcsid,s.username,s.osuser,su.contents,su.segtype,su.sqladdr,su.
sqlhash,su.sql_id;';
  l_program_action := l_program_action||'  COMMIT; ';
  l_program_action := l_program_action||'END;';

  BEGIN
    DBMS_SCHEDULER.CREATE_PROGRAM(
      program_name => 'SNAP_TEMP_USAGE'
     ,program_type => 'PLSQL_BLOCK'
     ,program_action => l_program_action
     ,enabled        => TRUE
     ,comments       => 'Program to snap the temp usage into TEMP_USE_HISTORY'
                                 );
  EXCEPTION
    WHEN l_27477 THEN
      NULL;  -- Ignore if the program exists
  END;

  BEGIN
    DBMS_SCHEDULER.CREATE_JOB(
      job_name      => 'JOB_SNAP_TEMP_USAGE'
     ,program_name  => 'SNAP_TEMP_USAGE'
     ,schedule_name => 'MINUTELY_5M'
     ,enabled       => TRUE
     ,auto_drop     => FALSE
     ,comments      => 'Job to snap the temp usage into TEMP_USE_HISTORY'
                             );
  EXCEPTION
    WHEN l_27477 THEN
      NULL;  -- Ignore if the job exists
  END;
END;
/

I can now run queries against the TEMP_USE_HISTORY table to show how much TEMP has been used, when, by whom and for what use, e.g.

SQL> ed
Wrote file afiedt.buf

  1  select snap_date,round(sum(blocks)*32/1024/1024) gb
  2  from temp_use_history
  3  where snap_date > sysdate-1
  4  group by snap_date
  5  having round(sum(blocks)*32/1024/1024) > 50
  6* order by 1
SQL> /

SNAP_DATE                    GB
-------------------- ----------
26-FEB-2008 16:12:25         57
26-FEB-2008 16:17:25         65
26-FEB-2008 16:22:25         74
26-FEB-2008 16:27:25         86
26-FEB-2008 16:32:25         95
26-FEB-2008 16:37:25        107
26-FEB-2008 16:42:25        121
26-FEB-2008 16:47:25        127
26-FEB-2008 16:52:25        147
26-FEB-2008 16:57:25        160
26-FEB-2008 17:02:25        162
26-FEB-2008 17:07:25        179
26-FEB-2008 17:12:25        196
26-FEB-2008 17:17:25        208
26-FEB-2008 17:22:25        217
26-FEB-2008 17:27:25        233
26-FEB-2008 17:32:25        241
26-FEB-2008 17:37:25        251
26-FEB-2008 17:42:25        257
26-FEB-2008 17:47:25        262
26-FEB-2008 17:52:25        264
26-FEB-2008 17:57:25        267
26-FEB-2008 18:02:25        201
27-FEB-2008 00:27:25         59
27-FEB-2008 00:32:25         60
27-FEB-2008 00:37:25         69
27-FEB-2008 01:12:25         57
27-FEB-2008 02:22:25         53
27-FEB-2008 09:57:25         51

29 rows selected.

From the above, I can see that, during the last twenty four hours on the database in question, there was reasonably heavy use of the TEMP area between 4pm and 6pm yesterday, and that the load peaked at approximately 267Gb.