DBMS_COMPRESSION can be run in parallel, at least from 11.2.0.4

I was trying to use DBMS_COMPRESSION on an 11gR2 (11.2.0.4 on RHEL 6.3) database the other day and was helped by this article from Neil Johnson. I was having some trouble with permissions until I read that article which nicely summarises everything you need to know – thanks Neil!

One thing I did notice is that Neil stated that you can’t parallelise the calls to the advisor since it uses the same named objects each time and this would then cause conflicts (and problems). Neil illustrated the example calls that the advisor is making based on him tracing the sessions…

create table "ACME".DBMS_TABCOMP_TEMP_UNCMP tablespace "SCRATCH" nologging
 as select /*+ DYNAMIC_SAMPLING(0) FULL("ACME"."ACCS") */ *
 from "ACME"."ACCS" sample block( 99) mytab
 
create table "ACME".DBMS_TABCOMP_TEMP_CMP organization heap 
 tablespace "SCRATCH" compress for all operations nologging
 as select /*+ DYNAMIC_SAMPLING(0) */ *
 from "ACME".DBMS_TABCOMP_TEMP_UNCMP mytab

Because I kept having permissions issues I was repeatedly running the advisor and I ended up with a situation where one of the transient objects (above, or so I thought) had been left in place and when I tried the next rerun it complained that the object existed. I can’t reproduce this as I can’t remember all the steps that I took and I wasn’t recording my session at the time – it’s not really the point of this blog in any case, rather the knowledge it led to. Because the error was that the object existed, I figured I just needed to find the object and drop it and I’d be good to carry on – obviously I looked at the above code fragments and started to search for the two objects in question (DBMS_TABCOMP_TEMP_UNCMP and DBMS_TABCOMP_TEMP_CMP) but found nothing. I started looking for DBMS_TABCOMP% and again found nothing.

Somewhat confused, I then looked for the latest object created and found that the objects were actually called something completely different and of the form CMPx$yyyyyyyy. I think this must have changed since Neil wrote his article (it is from 2013 after all).

I can’t work out what “x” is – at first I thought it was the RAC instance but that was just a coincidence that I saw a 3 and I was on instance 3 of a RAC cluster. In fact on a single instance database (test below) I saw numbers higher than 1 so it’s not the RAC instance number and I can’t work out what it is. “yyyyyyyy” is definitely the OBJECT_ID of the data object, confirmed by cross referencing the data dictionary.

Given this naming standard is therefore object specific, it suggests that you could execute these advisor calls in parallel.

Just to be clear, I’m not sure what version of 11g Neil was using but I am using 11.2.0.4:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
PL/SQL Release 11.2.0.4.0 - Production
CORE 11.2.0.4.0 Production
TNS for Linux: Version 11.2.0.4.0 - Production
NLSRTL Version 11.2.0.4.0 - Production

A little test using Neil’s code (and a bit more as I don’t have acme user on my database):

First create a script called dbms_comp.sql with the following content:

set serveroutput on
set feedback on
set verify off
 
declare
 blkcnt_cmp BINARY_integer;
 blkcnt_uncmp BINARY_integer;
 row_cmp BINARY_integer;
 row_uncmp BINARY_integer;
 cmp_ratio number;
 comptype_str varchar2(60);
begin
 dbms_compression.get_compression_ratio(
 scratchtbsname => upper('&3.')
 , ownname => upper('&1.')
 , tabname => upper('&2.')
 , partname => null
 , comptype => dbms_compression.comp_for_oltp
 , blkcnt_cmp => blkcnt_cmp
 , blkcnt_uncmp => blkcnt_uncmp
 , row_cmp => row_cmp
 , row_uncmp => row_uncmp
 , cmp_ratio => cmp_ratio
 , comptype_str => comptype_str
 , subset_numrows => &4.
 );
 DBMS_OUTPUT.PUT_LINE('Block count compressed = ' || blkcnt_cmp);
 DBMS_OUTPUT.PUT_LINE('Block count uncompressed = ' || blkcnt_uncmp);
 DBMS_OUTPUT.PUT_LINE('Row count per block compressed = ' || row_cmp);
 DBMS_OUTPUT.PUT_LINE('Row count per block uncompressed = ' || row_uncmp);
 --DBMS_OUTPUT.PUT_LINE('Compression type = ' ||comptype_str);
 DBMS_OUTPUT.PUT_LINE('Compression ratio = '||round(blkcnt_uncmp/blkcnt_cmp,1)||' to 1');
 DBMS_OUTPUT.PUT_LINE('Compression % benefit = '||round((blkcnt_uncmp-blkcnt_cmp)/blkcnt_uncmp*100,1));
 --DBMS_OUTPUT.PUT_LINE('Compression ratio org= '||cmp_ratio);
end;
/
set verify on

Then create another script called setup.sql with the following content – I’m using auditing (thanks Tim!) to see the statements rather than tracing in this instance:

conn sys as sysdba
drop user acme cascade;
drop user nj cascade;
drop tablespace acme including contents and datafiles;
drop tablespace scratch including contents and datafiles;
drop role nj_dba;
create user acme identified by acme;
grant create session,create table to acme;
create tablespace acme datafile '/u01/app/oracle/oradata/db11g/acme01.dbf' size 2G;
alter user acme quota unlimited on acme;
create tablespace scratch datafile '/u01/app/oracle/oradata/db11g/scratch01.dbf' size 2G;
create role nj_dba;
create user nj identified by nj;
REM Use auditing instead of tracing to identify the statements run:
audit all by nj by access;
audit create table by nj by access;
grant create session, create any table, drop any table, select any table to nj_dba;
grant execute on sys.dbms_monitor to nj_dba;
grant nj_dba to nj;
alter user acme quota unlimited on scratch;
alter user nj quota unlimited on scratch;
grant ANALYZE ANY to NJ_DBA;

Now login to sqlplus /nolog and run setup.sql which should show this:

SQL> @setup
Enter password:
Connected.

User dropped.



User dropped.



Tablespace dropped.



Tablespace dropped.



Role dropped.



User created.



Grant succeeded.



Tablespace created.



User altered.



Tablespace created.



Role created.



User created.



Audit succeeded.



Audit succeeded.



Grant succeeded.



Grant succeeded.



Grant succeeded.



User altered.



User altered.



Grant succeeded.

 

Now login to acme and create the subject table for the compression advisor:

conn acme/acme
create table test tablespace acme as select lpad(TO_CHAR(ROWNUM),2000,'x') char_col from dual connect by level < 300000;

Table created.

Now check the compression using the advisor (ignore the actual compression results as we’re not interested in those at this time):

conn nj/nj
@dbms_comp acme test scratch 200000
Block count compressed = 2048
Block count uncompressed = 2048
Row count per block compressed = 3
Row count per block uncompressed = 3
Compression ratio = 1 to 1
Compression % benefit = 0

PL/SQL procedure successfully completed.

Now check the audit trail to find the statements run:

conn sys as sysdba
column username format a8 
column obj_name format a30
column timestamp format a30
column action_name format a20
set linesize 200
set pagesize 1000
SELECT USERNAME, action_name,obj_name, to_char(extended_TIMESTAMP,'DD-MON-YYYY HH24:MI:SSxFF') timestamp FROM DBA_AUDIT_TRAIL WHERE USERNAME='NJ' and action_name='CREATE TABLE';

USERNAME ACTION_NAME OBJ_NAME TIMESTAMP
-------- -------------------- ------------------------------ --------------------
NJ CREATE TABLE CMP3$87401 10-DEC-2017 18:01:15.119890
NJ CREATE TABLE CMP4$87401 10-DEC-2017 18:01:15.177518

(Abridged to remove non relevant tests)

Now check the dictionary to see the OBJECT_ID:

select object_name from dba_objects where object_id=87401;

OBJECT_NAME
--------------------------------------------------------------------------------------------------------------------------------
TEST

1 row selected.

OK, how about the parallelism? Let’s create a second table called TEST2 in ACME:

conn acme/acme
create table test2 tablespace acme as select lpad(TO_CHAR(ROWNUM),2000,'x') char_col from dual connect by level < 300000;

Table created.

Now run two parallel sessions – I did it by firing off the calls manually in separate SQL*Plus sessions rather than being clever:

In session 1:

conn nj/nj
@dbms_comp acme test scratch 200000

In session 2:

conn nj/nj
@dbms_comp acme test2 scratch 200000

 

First one gives:

Block count compressed = 1920
Block count uncompressed = 1920
Row count per block compressed = 3
Row count per block uncompressed = 3
Compression ratio = 1 to 1
Compression % benefit = 0

PL/SQL procedure successfully completed.

Second one gives:

Block count compressed = 2432
Block count uncompressed = 2432
Row count per block compressed = 3
Row count per block uncompressed = 3
Compression ratio = 1 to 1
Compression % benefit = 0

PL/SQL procedure successfully completed.

Both ran at the same time and didn’t fail

Now check the audit trail:

conn sys as sysdba
column username format a8 
column obj_name format a30
column timestamp format a30
column action_name format a20
set linesize 200
set pagesize 1000
SELECT USERNAME, action_name,obj_name, to_char(extended_TIMESTAMP,'DD-MON-YYYY HH24:MI:SSxFF') timestamp FROM DBA_AUDIT_TRAIL WHERE USERNAME='NJ' and action_name='CREATE TABLE';
USERNAME ACTION_NAME OBJ_NAME TIMESTAMP
-------- -------------------- ------------------------------ --------------------
NJ CREATE TABLE CMP3$87401 10-DEC-2017 18:01:15.119890
NJ CREATE TABLE CMP4$87401 10-DEC-2017 18:01:15.177518 
NJ CREATE TABLE CMP3$87408 10-DEC-2017 18:12:18.114321
NJ CREATE TABLE CMP3$87409 10-DEC-2017 18:12:18.114353
NJ CREATE TABLE CMP4$87408 10-DEC-2017 18:12:22.730715
NJ CREATE TABLE CMP4$87409 10-DEC-2017 18:12:22.735908
(Abridged to remove non relevant tests)

And from the dictionary:

select object_name from dba_objects where object_id IN(87408,87409);

OBJECT_NAME
--------------------------------------------------------------------------------------------------------------------------------
TEST
TEST2

2 rows selected.

So, it appears to allow parallel running without issue.

If anyone works out what the “x” part of the object names is (3 and 4 in the above example), please shout out in the comments…

12cR2 tightens up ORA-01841 for zero year ANSI dates, but not for Oracle SQL syntax

In moving some more code from an 11gR2 database to a 12cR2 database, I found another change where a piece of code that works in 11gR2 doesn’t compile in 12cR2.

In this instance a view was being created with a projected date column which used the ANSI DATE syntax. Here is a simplified test script:

CREATE OR REPLACE VIEW test1 AS
SELECT date '0000-01-01' date_col
FROM dual
/
DROP VIEW test
/

CREATE OR REPLACE VIEW test2 AS
SELECT TO_DATE('0000-01-01','YYYY-MM-DD') date_col
FROM dual
/

Running this on 11gR2 gives:

SQL>CREATE OR REPLACE VIEW test1 AS
  2  SELECT date '0000-01-01' date_col
  3  FROM   dual
  4  /

View created.

SQL>CREATE OR REPLACE VIEW test2 AS
  2  SELECT TO_DATE('0000-01-01','YYYY-MM-DD') date_col
  3  FROM   dual
  4  /

View created.

Now running this on 12cR2 gives:

SQL> CREATE OR REPLACE VIEW test1 AS
  2  SELECT date '0000-01-01' date_col
  3  FROM   dual
  4  /
SELECT date '0000-01-01' date_col
            *
ERROR at line 2:
ORA-01841: (full) year must be between -4713 and +9999, and not be 0


SQL> CREATE OR REPLACE VIEW test2 AS
  2  SELECT TO_DATE('0000-01-01','YYYY-MM-DD') date_col
  3  FROM   dual
  4  /

View created.

The date is zero and thus the error message is correct in 12cR2 for the ANSI DATE syntax.

ORA-54002 when trying to create Virtual Column using REGEXP_REPLACE on Oracle 12cR2

I encountered an issue today trying to create a table in an Oracle 12cR2 database, the DDL for which, I extracted from an Oracle 11gR2 database. The error returned when trying to create the table was:

ORA-54002: only pure functions can be specified in a virtual column expression

The definition of the table included a Virtual Column which used a REGEXP_REPLACE call to derive a value from another column on the table.

Here is a simplified test case illustrating the scenario (Thanks Tim for the REGEXP_REPLACE example code):

select * from v$version
/
create table test_ora54002_12c(
 col1 VARCHAR2(20 CHAR) NOT NULL
 ,virtual_column1 VARCHAR2(4000 CHAR) GENERATED ALWAYS AS(REGEXP_REPLACE(col1, '([A-Z])', ' \1', 2)) VIRTUAL VISIBLE
)
/
drop table test_ora54002_12c purge
/

Running this on 11gR2 gives:

SQL> select * from v$version
 2 /

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
PL/SQL Release 11.2.0.4.0 - Production
CORE 11.2.0.4.0 Production
TNS for Linux: Version 11.2.0.4.0 - Production
NLSRTL Version 11.2.0.4.0 - Production

5 rows selected.

Elapsed: 00:00:00.40
SQL> create table test_ora54002_12c(
 2 col1 VARCHAR2(20 CHAR) NOT NULL
 3 ,virtual_column1 VARCHAR2(4000 CHAR) GENERATED ALWAYS AS(REGEXP_REPLACE(col1, '([A-Z])', ' \1', 2)) VIRTUAL VISIBLE
 4 )
 5 /

Table created.

Elapsed: 00:00:00.24
SQL> drop table test_ora54002_12c purge
 2 /

Table dropped.

Running this on 12cR2 gives:

SQL> select * from v$version
/
 2
BANNER CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production 0
PL/SQL Release 12.2.0.1.0 - Production 0
CORE 12.2.0.1.0 Production 0
TNS for Linux: Version 12.2.0.1.0 - Production 0
NLSRTL Version 12.2.0.1.0 - Production 0

SQL> create table test_ora54002_12c(
 col1 VARCHAR2(20 CHAR) NOT NULL
 ,virtual_column1 VARCHAR2(4000 CHAR) GENERATED ALWAYS AS(REGEXP_REPLACE(col1, '([A-Z])', ' \1', 2)) VIRTUAL VISIBLE
)
/
 2 3 4 5 ,virtual_column1 VARCHAR2(4000 CHAR) GENERATED ALWAYS AS(REGEXP_REPLACE(col1, '([A-Z])', ' \1', 2)) VIRTUAL VISIBLE
 *
ERROR at line 3:
ORA-54002: only pure functions can be specified in a virtual column expression


SQL> drop table test_ora54002_12c purge
/
 2 drop table test_ora54002_12c purge
 *
ERROR at line 1:
ORA-00942: table or view does not exist

As you can see, 12cR2 gives the ORA-54002 error.

Looking on MOS, highlights this article, which suggests that you shouldn’t have been able to do this in 11gR2, i.e. it was a bug and that 12cR2 has fixed this bug and thus you can no longer create such a virtual column (the article refers to functional index and check constraint use cases as well).

In my case, I was able to rewrite the virtual column to use simple string functions such as SUBSTR, TRANSLATE and INSTR to achieve what I wanted and the virtual column was allowed to be created with these – problem solved – a shame really as the REGEXP_REPLACE approach was far neater.

Oracle Virtual Columns – Can’t use plain column or duplicate expressions

I had a scenario today where I was loading a table and a particular column is known by multiple names in different source systems and thus to different people. In order to make everyone happy on this occasion, I wondered if I could create a normal column for one of the multiple names and then use virtual columns pointing at the normal column, for the other names.

I’m aware there are several ways of skinning this cat and that virtual columns was probably not the best choice in the first place, but I was just playing with an idea and it didn’t quite end up where I thought…so it was interesting in that respect.

The test code:


CREATE TABLE test
  (
     column1    INTEGER NOT NULL 
    ,virtual_column1 INTEGER AS ( column1 ) VIRTUAL NOT NULL 
  )
/
CREATE TABLE test
  (
     column1    INTEGER NOT NULL 
    ,virtual_column1 INTEGER AS ( column1 + 0 ) VIRTUAL NOT NULL 
  )
/
DROP TABLE test PURGE
/
CREATE TABLE test
  (
     column1    INTEGER NOT NULL 
    ,virtual_column1 INTEGER AS ( column1 + 0 ) VIRTUAL NOT NULL 
    ,virtual_column2 INTEGER AS ( column1 + 0 ) VIRTUAL NOT NULL 
  )
/
CREATE TABLE test
  (
     column1    INTEGER NOT NULL 
    ,virtual_column1 INTEGER AS ( column1 + 0 ) VIRTUAL NOT NULL 
    ,virtual_column2 INTEGER AS ( column1 + 1 - 1 ) VIRTUAL NOT NULL 
  )
/
DROP TABLE test PURGE
/

And the results:


    ,virtual_column1 INTEGER AS ( column1 ) VIRTUAL NOT NULL
                                  *
ERROR at line 4:
ORA-54016: Invalid column expression was specified



Table created.


Table dropped.

    ,virtual_column2 INTEGER AS ( column1 + 0 ) VIRTUAL NOT NULL
     *
ERROR at line 5:
ORA-54015: Duplicate column expression was specified



Table created.


Table dropped.

I expected it to just work, but I clearly ran in to two problems which scuppered my idea. Firstly, the virtual column can’t simply refer to a normal column without any changes to it, otherwise it fails with ORA-54016. The error isn’t particularly helpful, but eventually I worked out that it was because the column is simply a mapping to a non virtual column. Working around that by adding zero to the numeric column gets it to work, but it’s an ugly hack.

In my scenario there are three different names for this column, depending on the users involved, which then leads on to the next issue, which is that I’d then need two virtual columns pointing at the same source column. Unfortunately if I use the same hack twice, it fails with ORA-54015, because you can’t have two virtual columns with exactly the same expression! A slight variant to the hack and it works, but it’s getting uglier and uglier!

Time to seek out plan B!

About Oracle: DBA_DEPENDENCY_COLUMNS

A colleague asked if there was a way to do column level dependency tracking recently. He wanted to know for a given view, which tables and the columns on those tables, it was dependent upon, without, of course, reading through the code.

I was vaguely aware that since 11gR1 Oracle has been tracking fine grained (column) dependencies, but couldn’t find a way of seeing the details stored, until I found this interesting article from Rob Van Wijk:

About Oracle: DBA_DEPENDENCY_COLUMNS

I passed the details on to our DBA who implemented it and it seemed to work, for us. Your mileage may vary, of course.

Some comments on Rob’s blog post, bearing in mind, of course, that it was written in 2008 and refers to 11gR1:

  1. D_ATTRS contains values other than “000100000A”. I’ve observed this in a basic 12c install and a production 11gR2 install
  2. D_ATTRS is commented in $ORACLE_HOME/rdbms/admin/dcore.bsq as “/* Finer grain attr. numbers if finer grained */”
  3. D_REASON is commented in $ORACLE_HOME/rdbms/admin/dcore.bsq as “/* Reason mask of attrs causing invalidation */”.  On my basic 12c installation, all rows contain NULL for this value although on a production 11gR2 database I observed a handful of rows with values in this column.
  4. Noted from the comments against Rob’s article is the opportunity to vote for this feature on OTN here

 

KEEP DENSE_RANK versus ROW_NUMBER – further details

I found this nice post from Laurent Schneider the other day and wanted  to comment, but my comments were a bit more of a digression and discussion, so I’ve blogged it and put a link on the comments of the post by Laurent.

I’d always used the ROW_NUMBER method myself until I read this and then figured I’d try the KEEP DENSE_RANK method, which works, as Laurent describes. One thing that didn’t sit well with me in the post from Laurent was that he said “the second one should be more performant” – I prefer hard facts, so I decided to test it a bit and my results are below.

In the simple example that Laurent gave, it’s difficult to tell which is quickest, since the table in question only has a handful of rows and therefore any benchmarking is more susceptible to other influences, clouding the results. I figured I’d build a larger table and try it on that.

Before I did that though, I did get the plans from the two statements Laurent ran on the EMP table and both show the same resource costings:

Firstly, for the ROW_NUMBER method:

select ename
,      deptno
,      sal
from   (select ename
 ,      deptno
 ,      sal
 ,      row_number() over (partition by deptno order by sal desc,empno) r
 from   emp
 )
where  r=1;

Plan hash value: 3291446077                                                                                                             

---------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |    14 |   644 |     4  (25)| 00:00:01 |
|*  1 |  VIEW                    |      |    14 |   644 |     4  (25)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK|      |    14 |   644 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL     | EMP  |    14 |   644 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

 1 - filter("R"=1)
 2 - filter(ROW_NUMBER() OVER ( PARTITION BY "DEPTNO" ORDER BY
 INTERNAL_FUNCTION("SAL") DESC ,"EMPNO")<=1)

Note
-----
 - dynamic sampling used for this statement

Now, the KEEP DENSE_RANK method:

select max(ename) keep (dense_rank first order by sal desc,empno) ename
,      deptno
,      max(sal) sal
from   emp 
group by deptno;

Plan hash value: 15469362

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    14 |   644 |     4  (25)| 00:00:01 |
|   1 |  SORT GROUP BY     |      |    14 |   644 |     4  (25)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| EMP  |    14 |   644 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement

So, the plans are very similar, especially in terms of having the same resource usage…which means they should be similar in terms of performance…running them, as Laurent did, shows around 1s response times, which, as I say, doesn’t conclusively prove which method is quickest.

OK, on to a bigger example then…

I basically created a similar table to EMP, called JEFF_EMP and added a few more columns (for later) and then put ten million rows in it, taking around 1.3GB on my system…plenty to churn through.

DROP TABLE jeff_emp PURGE
/
CREATE TABLE jeff_emp(deptno     NUMBER
                     ,ename      VARCHAR2(100)
                     ,first_name VARCHAR2(50)
                     ,initials   VARCHAR2(30)
                     ,surname    VARCHAR2(50)
                     ,sal        NUMBER
                     ,empno      NUMBER
                     )
/
INSERT INTO jeff_emp(deptno,ename,first_name,initials,surname,sal,empno)
SELECT (MOD(ROWNUM,3) + 1) * 10
,      'FIRSTNAME_'||TO_CHAR(ROWNUM)||'_INITIALS_'||TO_CHAR(ROWNUM)||'_SURNAME_'||TO_CHAR(ROWNUM)
,      'FIRSTNAME_'||TO_CHAR(ROWNUM)
,      'INITIALS_'||TO_CHAR(ROWNUM)
,      'SURNAME_'||TO_CHAR(ROWNUM)
,      ROWNUM * 100
,      ROWNUM
FROM   (SELECT LEVEL l FROM dual CONNECT BY LEVEL < 10000001) ORDER BY l / COMMIT / EXEC dbms_stats.gather_table_stats(ownname => USER, tabname => 'JEFF_EMP',estimate_percent=>10);

Now, here is the plan for the ROW_NUMBER method:

---------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |    10M|   868M|       |   204K  (1)| 00:40:49 |
|*  1 |  VIEW                    |          |    10M|   868M|       |   204K  (1)| 00:40:49 |
|*  2 |   WINDOW SORT PUSHED RANK|          |    10M|   629M|  1533M|   204K  (1)| 00:40:49 |
|   3 |    TABLE ACCESS FULL     | JEFF_EMP |    10M|   629M|       | 46605   (1)| 00:09:20 |
---------------------------------------------------------------------------------------------

…and the results:

ENAME                                                                                                    DEPTNO        SAL
---------------------------------------------------------------------------------------------------- ---------- ----------
FIRSTNAME_9999999_INITIALS_9999999_SURNAME_9999999                                                           10  999999900
FIRSTNAME_10000000_INITIALS_10000000_SURNAME_10000000                                                        20 1000000000
FIRSTNAME_9999998_INITIALS_9999998_SURNAME_9999998                                                           30  999999800

Elapsed: 00:00:24.47

…and the KEEP DENSE_RANK method plan:

-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     3 |   198 | 47109   (2)| 00:09:26 |
|   1 |  SORT GROUP BY     |          |     3 |   198 | 47109   (2)| 00:09:26 |
|   2 |   TABLE ACCESS FULL| JEFF_EMP |    10M|   629M| 46605   (1)| 00:09:20 |
-------------------------------------------------------------------------------

…and it’s results:

ENAME                                                                                                    DEPTNO        SAL
---------------------------------------------------------------------------------------------------- ---------- ----------
FIRSTNAME_9999999_INITIALS_9999999_SURNAME_9999999                                                           10  999999900
FIRSTNAME_10000000_INITIALS_10000000_SURNAME_10000000                                                        20 1000000000
FIRSTNAME_9999998_INITIALS_9999998_SURNAME_9999998                                                           30  999999800

Elapsed: 00:00:07.76

So, reasonably clear results, indicating that the KEEP DENSE_RANK is about a third of the time to run, compared to the ROW_NUMBER method. You can also see from the plans that the ROW_NUMBER method involves use of TEMP, whereas the KEEP DENSE_RANK doesn’t, hence the slowdown.

So, Laurent was correct in his assertion that it should be more performant…but it’s nice to see the results based on a more meaningful set of data.

Now, there was one other thing that concerned me, and that was whether if you added more columns into the SQL, would it change the performance fo either method to any significant degree, so I started using the extra name columns like this:

SELECT ename
,      first_name
,      initials
,      surname
,      deptno
,      sal
FROM   (SELECT ename
        ,      first_name
        ,      initials
        ,      surname
        ,      deptno
        ,      sal
        ,      ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC,empno) r 
        FROM   jeff_emp
       )
WHERE  r = 1
/

…which has a plan of:

---------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |    10M|  1546M|       |   307K  (1)| 01:01:36 |
|*  1 |  VIEW                    |          |    10M|  1546M|       |   307K  (1)| 01:01:36 |
|*  2 |   WINDOW SORT PUSHED RANK|          |    10M|  1107M|  2606M|   307K  (1)| 01:01:36 |
|   3 |    TABLE ACCESS FULL     | JEFF_EMP |    10M|  1107M|       | 46605   (1)| 00:09:20 |
---------------------------------------------------------------------------------------------

…and results:

ENAME                                                                                                FIRST_NAME                                         INITIALS                       SURNAME                      DEPTNO         SAL
---------------------------------------------------------------------------------------------------- -------------------------------------------------- ------------------------------ -------------------------------------------------- ---------- ----------
FIRSTNAME_9999999_INITIALS_9999999_SURNAME_9999999                                                   FIRSTNAME_9999999                                  INITIALS_9999999               SURNAME_9999999          10  999999900
FIRSTNAME_10000000_INITIALS_10000000_SURNAME_10000000                                                FIRSTNAME_10000000                                 INITIALS_10000000              SURNAME_10000000         20 1000000000
FIRSTNAME_9999998_INITIALS_9999998_SURNAME_9999998                                                   FIRSTNAME_9999998                                  INITIALS_9999998               SURNAME_9999998          30  999999800

Elapsed: 00:00:25.76

For the KEEP DENSE_RANK I get:

SELECT MAX(ename) KEEP (DENSE_RANK FIRST ORDER BY sal DESC,empno) ename
,      MAX(first_name) KEEP (DENSE_RANK FIRST ORDER BY sal DESC,empno) first_name
,      MAX(initials) KEEP (DENSE_RANK FIRST ORDER BY sal DESC,empno) initials
,      MAX(surname) KEEP (DENSE_RANK FIRST ORDER BY sal DESC,empno) surname
,      deptno
,      MAX(sal) sal
FROM   jeff_emp 
GROUP BY deptno
/

Which has the following plan:

-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     3 |   348 | 47109   (2)| 00:09:26 |
|   1 |  SORT GROUP BY     |          |     3 |   348 | 47109   (2)| 00:09:26 |
|   2 |   TABLE ACCESS FULL| JEFF_EMP |    10M|  1107M| 46605   (1)| 00:09:20 |
-------------------------------------------------------------------------------

…and results:

ENAME                                                                                                FIRST_NAME                                         INITIALS                       SURNAME                      DEPTNO         SAL
---------------------------------------------------------------------------------------------------- -------------------------------------------------- ------------------------------ -------------------------------------------------- ---------- ----------
FIRSTNAME_9999999_INITIALS_9999999_SURNAME_9999999                                                   FIRSTNAME_9999999                                  INITIALS_9999999               SURNAME_9999999          10  999999900
FIRSTNAME_10000000_INITIALS_10000000_SURNAME_10000000                                                FIRSTNAME_10000000                                 INITIALS_10000000              SURNAME_10000000         20 1000000000
FIRSTNAME_9999998_INITIALS_9999998_SURNAME_9999998                                                   FIRSTNAME_9999998                                  INITIALS_9999998               SURNAME_9999998          30  999999800

Elapsed: 00:00:14.56

So, the differential in performance has reduced significantly, with the KEEP DENSE_RANK around double it’s original time, whilst the ROW_NUMBER method has only increased marginally. I’ve not tested with adding additional columns, but I’m guessing (I know…I could and should test it!) it will get worse, to the extent that, eventually, the KEEP DENSE_RANK will become the worse performer. If that’s the case, then essentially, these two methods have different scalability dynamics and one should bear this in mind when considering which to choose, depending on how many DENSE_RANK’d columns you’d need to deliver your results.

Hope this helps.

RANK v DENSE_RANK v ROW_NUMBER

I was asked a question by one of my users about the difference between the RANK and ROW_NUMBER analytics yesterday, so here is a post on it…

RANK, ROW_NUMBER and also DENSE_RANK are very useful for taking a set of rows and ordering them in a defined manner, whilst giving each row a “position value”. They differ based on the approach taken to define the value of their position in the set of output rows. In some circumstances, they may all give the same value however, dependent on the data, they may differ.

An example based on the SCOTT.EMP table, helps to illustrate…

SELECT empno
,      sal
,      RANK() OVER(ORDER BY sal) rank_position
,      DENSE_RANK() OVER(ORDER BY sal) dense_rank_position
,      ROW_NUMBER() OVER(ORDER BY sal) row_number_position
FROM   emp
/

which returns, on my 11gR1 database…

EMPNO        SAL RANK_POSITION DENSE_RANK_POSITION ROW_NUMBER_POSITION
---------- ---------- ------------- ------------------- -------------------
7369        800             1                   1                   1
7900        950             2                   2                   2
7876       1100             3                   3                   3
7521       1250             4                   4                   4
7654       1250             4                   4                   5
7934       1300             6                   5                   6
7844       1500             7                   6                   7
7499       1600             8                   7                   8
7782       2450             9                   8                   9
7698       2850            10                   9                  10
7566       2975            11                  10                  11
7788       3000            12                  11                  12
7902       3000            12                  11                  13
7839       5000            14                  12                  14
14 rows selected.

Notice that RANK has given the two employees with SAL = 1250, the same position value of 4 and the two employees with SAL=3000, the same position value of 12. Notice also that RANK skips position values 5 and 13 as it has two entries for 4 and 12 respectively. RANK uses all numbers between 1 and 14, except 5 and 13. RANK has both repeats and gaps in it’s ordering.

DENSE_RANK is similar to RANK, in that it gives the two employees with SAL=1250, the same position value of 4, but then it does not skip over position value 5 – it simply carries on at position 5 for the next values. DENSE_RANK uses, for the position values, all numbers between 1 and 12, without leaving any out, and using 4 and 11 twice. DENSE_RANK has no gaps in it’s ordering, only repeats.

ROW_NUMBER gives each row a unique position value and consequently uses all the numbers between 1 and 14. ROW_NUMBER has no gaps or repeats in it’s ordering. Note that the position value on ROW_NUMBER is not deterministic, since the ORDER BY clause only has SAL in it. If you want to ensure the order is the same each time, you need to add further columns to the ORDER BY clause.

No pruning for MIN/MAX of partition key column

Recently, I wanted to work out the maximum value of a column on a partitioned table. The column I wanted the maximum value for, happened to be the (single and only) partition key column. The table in question was range partitioned on this single key column, into monthly partitions for 2009, with data in all the partitions behind the current date, i.e. January through mid June were populated. There were no indexes on the table.

NOTE – I tried this on 10.2.04 (AIX) and 11.1.0 (Fedora 11) – the example below is from 11.1.0.

I’ll recreate the scenario here:

CREATE TABLESPACE tsp1
datafile '/u01/app/oracle/oradata/T111/tsp1.dbf' size 100M 
autoextend off extent management local  uniform size 1m segment space management auto online
/
CREATE TABLESPACE tsp2
datafile '/u01/app/oracle/oradata/T111/tsp2.dbf' size 100M 
autoextend off extent management local  uniform size 1m segment space management auto online
/

DROP TABLE test PURGE
/
CREATE TABLE test(col_date_part_key DATE NOT NULL
,col2 VARCHAR2(2000) NOT NULL
)
PARTITION BY RANGE(col_date_part_key)
(PARTITION month_01 VALUES LESS THAN (TO_DATE(’01-FEB-2009′,’DD-MON-YYYY’)) TABLESPACE tsp1
,PARTITION month_02 VALUES LESS THAN (TO_DATE(’01-MAR-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_03 VALUES LESS THAN (TO_DATE(’01-APR-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_04 VALUES LESS THAN (TO_DATE(’01-MAY-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_05 VALUES LESS THAN (TO_DATE(’01-JUN-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_06 VALUES LESS THAN (TO_DATE(’01-JUL-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_07 VALUES LESS THAN (TO_DATE(’01-AUG-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_08 VALUES LESS THAN (TO_DATE(’01-SEP-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_09 VALUES LESS THAN (TO_DATE(’01-OCT-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_10 VALUES LESS THAN (TO_DATE(’01-NOV-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_11 VALUES LESS THAN (TO_DATE(’01-DEC-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_12 VALUES LESS THAN (TO_DATE(’01-JAN-2010′,’DD-MON-YYYY’)) TABLESPACE tsp2
)
/
REM Insert rows, but only up to 14-JUN-2009
INSERT INTO test(col_date_part_key,col2)
SELECT TO_DATE(’31-DEC-2008′,’DD-MON-YYYY’) + l
, LPAD(‘X’,2000,’X’)
FROM (SELECT level l FROM dual CONNECT BY level < 166)
/
COMMIT
/
SELECT COUNT(*)
FROM test
/
SELECT MIN(col_date_part_key) min_date
, MAX(col_date_part_key) max_date
FROM test
/

This runs and gives the following output:

DROP TABLE test PURGE                                               
           *                                                        
ERROR at line 1:                                                    
ORA-00942: table or view does not exist

DROP TABLESPACE tsp1 INCLUDING CONTENTS
*
ERROR at line 1:
ORA-00959: tablespace ‘TSP1’ does not exist

DROP TABLESPACE tsp2 INCLUDING CONTENTS
*
ERROR at line 1:
ORA-00959: tablespace ‘TSP2’ does not exist

Tablespace created.

Tablespace created.

Table created.

165 rows created.

Commit complete.

COUNT(*)
———-
165

MIN_DATE MAX_DATE
——— ———
01-JAN-09 14-JUN-09

Now, lets see what the plan looks like from AUTOTRACE when we run the following query to get the maximum value of COL_DATE_PART_KEY:

SQL> SET AUTOTRACE ON
SQL> SELECT MAX(col_date_part_key) min_date
  2  FROM   test                           
  3  /

MIN_DATE
———
14-JUN-09

Execution Plan
———————————————————-
Plan hash value: 784602781

———————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
———————————————————————————————
| 0 | SELECT STATEMENT | | 1 | 9 | 99 (0)| 00:00:02 | | |
| 1 | SORT AGGREGATE | | 1 | 9 | | | | |
| 2 | PARTITION RANGE ALL| | 132 | 1188 | 99 (0)| 00:00:02 | 1 | 12 |
| 3 | TABLE ACCESS FULL | TEST | 132 | 1188 | 99 (0)| 00:00:02 | 1 | 12 |
———————————————————————————————

Note
—–
– dynamic sampling used for this statement

Statistics
———————————————————-
0 recursive calls
0 db block gets
320 consistent gets
51 physical reads
0 redo size
527 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> SET AUTOTRACE OFF

It shows a full scan of all twelve partitions. I figured that the the plan for such a query would show a full table scan, of all partitions for that table – because, in theory, if all but the first partition were empty, then the whole table would have to be scanned to answer the query – and Oracle wouldn’t know at plan creation time, whether the data met this case, so it would have to do the full table scan to ensure the correct result.

What I thought might happen though, is that in executing the query, it would be able to short circuit things, by working through the partitions in order, from latest to earliest, and finding the first, non null, value. Once it found the first, non null, value, it would know not to continue looking in the earlier partitions, since the value of COL_DATE_PART_KEY couldn’t possibly be greater than the non null value already identified.

It doesn’t appear to have this capability, which we can check by taking one of the partitions offline and then rerunning the query, whereupon it complains that not all the data is present…

SQL> ALTER TABLESPACE tsp1 OFFLINE;

Tablespace altered.

SQL> SET AUTOTRACE ON
SQL> SELECT MAX(col_date_part_key) min_date
2 FROM test
3 /
SELECT MAX(col_date_part_key) min_date
*
ERROR at line 1:
ORA-00376: file 6 cannot be read at this time
ORA-01110: data file 6: ‘/u01/app/oracle/oradata/T111/tsp1.dbf’

SQL> SET AUTOTRACE OFF

So, even though we know we could actually answer this question accurately, Oracle can’t do it as it wants to scan, unnecessarily, the whole table.

I did find a thread which somebody had asked about this on OTN, but all the responses were about workarounds, rather than explaining why this happens (bug/feature) or how it can be made to work in the way I, or the poster of that thread, think it, perhaps, should.

Can anyone else shed any light on this? If it’s a feature, then it seems like something that could be easily coded more efficiently by Oracle. The same issue would affect both MIN and MAX since both could be
approached in the same manner.

Problem with _gby_hash_aggregation_enabled parameter

Here’s a tale about an Oracle initialisation parameter…and a lesson we should all take note of…

For three days, my colleagues in the support team on one of the warehouses I’m involved in, were struggling with a piece of code which was exhausting the available temp space and after trying everything they could think of, they asked me to take a look. I must admit I was a little baffled at first because the piece of code in question had been happily running for some time now and every time I’d run it, I’d never noticed that TEMP was anywhere near being exhausted so, whilst I could tell the process had some kind of problem, I was in the dark as to exactly what that problem was.

After trying to break down the large query into several smaller steps, I realised that it was an early step in the query that was exhibiting the problem of temp exhaustion – the step being a pivot of around 300 million measure rows into a pivot target table.

This process runs monthly and it had run without issue on all of the previous 20 occurrences and for roughly the same, if not more rows to pivot…so it didn’t appear to be the volume that was causing the problem.

There had been no changes in the code itself for many months and the current version of the code had been run successfully in previous months, so it didn’t appear to be a code fault.

I obtained the actual execution path for the statement whilst it was running and it looked reasonable, although looking at it, triggered a thought in my mind…what if something had changed in the database configuration?

Why would I get that thought from looking at the execution path?

Well, a while back, we had received some advice that a line in our init.ora as follows, should be changed to set the feature to TRUE instead of FALSE, so that the feature became active:

_gby_hash_aggregation_enabled = FALSE

This results in a line in the plan that reads:

HASH GROUP BY

instead of

SORT GROUP BY

The parameter was set to FALSE due to a known bug and the issues we’d seen with it, however the recent advice we’d received, indicated that the bug had been resolved at the version level we were on and that by enabling the feature – which enables GROUP BY and Aggregation using a hash scheme – we’d gain a performance boost for certain queries.

So, the DBA team researched the advice and it appeared to be the case, that the bug (4604970) which led to the disabling of the feature was fixed at our version level (10.2.0.3 on HP-UX). We duly turned on the feature in a pre production environment and ran it for a while without noticing any issues. We then enabled it in production and again, for a while, we’ve not noticed any issues…until now.

After a check back through the logs, it appeared that since the parameter was first enabled, the queries which were now failing, had not been run at all…they had only run prior to the parameter change…so with my suspicions aroused further, I disabled the feature at the session level and reran the process. It completed in a normal time frame and used a small amount of TEMP – hooray!

So, now we have to go back to support to try and understand if the original bug is not quite fixed or whether this is a different scenario…in any event, we’re going to disable the feature for now, even though we’re only getting problems with the feature on 2 processes out of perhaps thousands.

So, what’s the lesson to learn?

Well, quite simply, that you need to have a thorough audit of what configuration changes you’ve made together with a good audit of the processes you’ve run so that you can work out what has changed since the last time you successfully ran a process. This gives you a fighting chance of spotting things like the above.