No pruning for MIN/MAX of partition key column

Recently, I wanted to work out the maximum value of a column on a partitioned table. The column I wanted the maximum value for, happened to be the (single and only) partition key column. The table in question was range partitioned on this single key column, into monthly partitions for 2009, with data in all the partitions behind the current date, i.e. January through mid June were populated. There were no indexes on the table.

NOTE – I tried this on 10.2.04 (AIX) and 11.1.0 (Fedora 11) – the example below is from 11.1.0.

I’ll recreate the scenario here:


CREATE TABLESPACE tsp1
datafile '/u01/app/oracle/oradata/T111/tsp1.dbf' size 100M
autoextend off extent management local uniform size 1m segment space management auto online
/
CREATE TABLESPACE tsp2
datafile '/u01/app/oracle/oradata/T111/tsp2.dbf' size 100M
autoextend off extent management local uniform size 1m segment space management auto online
/

DROP TABLE test PURGE
/
CREATE TABLE test(col_date_part_key DATE NOT NULL
,col2 VARCHAR2(2000) NOT NULL
)
PARTITION BY RANGE(col_date_part_key)
(PARTITION month_01 VALUES LESS THAN (TO_DATE('01-FEB-2009','DD-MON-YYYY')) TABLESPACE tsp1
,PARTITION month_02 VALUES LESS THAN (TO_DATE('01-MAR-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_03 VALUES LESS THAN (TO_DATE('01-APR-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_04 VALUES LESS THAN (TO_DATE('01-MAY-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_05 VALUES LESS THAN (TO_DATE('01-JUN-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_06 VALUES LESS THAN (TO_DATE('01-JUL-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_07 VALUES LESS THAN (TO_DATE('01-AUG-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_08 VALUES LESS THAN (TO_DATE('01-SEP-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_09 VALUES LESS THAN (TO_DATE('01-OCT-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_10 VALUES LESS THAN (TO_DATE('01-NOV-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_11 VALUES LESS THAN (TO_DATE('01-DEC-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_12 VALUES LESS THAN (TO_DATE('01-JAN-2010','DD-MON-YYYY')) TABLESPACE tsp2
)
/
REM Insert rows, but only up to 14-JUN-2009
INSERT INTO test(col_date_part_key,col2)
SELECT TO_DATE('31-DEC-2008','DD-MON-YYYY') + l
, LPAD('X',2000,'X')
FROM (SELECT level l FROM dual CONNECT BY level < 166)
/
COMMIT
/
SELECT COUNT(*)
FROM test
/
SELECT MIN(col_date_part_key) min_date
, MAX(col_date_part_key) max_date
FROM test
/

This runs and gives the following output:


DROP TABLE test PURGE
*
ERROR at line 1:
ORA-00942: table or view does not exist


DROP TABLESPACE tsp1 INCLUDING CONTENTS
*
ERROR at line 1:
ORA-00959: tablespace 'TSP1' does not exist


DROP TABLESPACE tsp2 INCLUDING CONTENTS
*
ERROR at line 1:
ORA-00959: tablespace 'TSP2' does not exist



Tablespace created.


Tablespace created.


Table created.


165 rows created.


Commit complete.


COUNT(*)
----------
165


MIN_DATE MAX_DATE
--------- ---------
01-JAN-09 14-JUN-09

Now, lets see what the plan looks like from AUTOTRACE when we run the following query to get the maximum value of COL_DATE_PART_KEY:


SQL> SET AUTOTRACE ON
SQL> SELECT MAX(col_date_part_key) min_date
2 FROM test
3 /

MIN_DATE
---------
14-JUN-09


Execution Plan
----------------------------------------------------------
Plan hash value: 784602781

---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 9 | 99 (0)| 00:00:02 | | |
| 1 | SORT AGGREGATE | | 1 | 9 | | | | |
| 2 | PARTITION RANGE ALL| | 132 | 1188 | 99 (0)| 00:00:02 | 1 | 12 |
| 3 | TABLE ACCESS FULL | TEST | 132 | 1188 | 99 (0)| 00:00:02 | 1 | 12 |
---------------------------------------------------------------------------------------------

Note
-----
- dynamic sampling used for this statement


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
320 consistent gets
51 physical reads
0 redo size
527 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> SET AUTOTRACE OFF

It shows a full scan of all twelve partitions. I figured that the the plan for such a query would show a full table scan, of all partitions for that table – because, in theory, if all but the first partition were empty, then the whole table would have to be scanned to answer the query – and Oracle wouldn’t know at plan creation time, whether the data met this case, so it would have to do the full table scan to ensure the correct result.

What I thought might happen though, is that in executing the query, it would be able to short circuit things, by working through the partitions in order, from latest to earliest, and finding the first, non null, value. Once it found the first, non null, value, it would know not to continue looking in the earlier partitions, since the value of COL_DATE_PART_KEY couldn’t possibly be greater than the non null value already identified.

It doesn’t appear to have this capability, which we can check by taking one of the partitions offline and then rerunning the query, whereupon it complains that not all the data is present…


SQL> ALTER TABLESPACE tsp1 OFFLINE;

Tablespace altered.

SQL> SET AUTOTRACE ON
SQL> SELECT MAX(col_date_part_key) min_date
2 FROM test
3 /
SELECT MAX(col_date_part_key) min_date
*
ERROR at line 1:
ORA-00376: file 6 cannot be read at this time
ORA-01110: data file 6: '/u01/app/oracle/oradata/T111/tsp1.dbf'


SQL> SET AUTOTRACE OFF

So, even though we know we could actually answer this question accurately, Oracle can’t do it as it wants to scan, unnecessarily, the whole table.

I did find a thread which somebody had asked about this on OTN, but all the responses were about workarounds, rather than explaining why this happens (bug/feature) or how it can be made to work in the way I, or the poster of that thread, think it, perhaps, should.

Can anyone else shed any light on this? If it’s a feature, then it seems like something that could be easily coded more efficiently by Oracle. The same issue would affect both MIN and MAX since both could be
approached in the same manner.

12 thoughts on “No pruning for MIN/MAX of partition key column

  1. jonathanlewis

    Jeff,
    It is possible for Oracle to do a descending walk through partitions – so I think we probably have to assume that this particular case hasn't been coded for yet.

    Here's a plan (from 10.2.0.3) for

    select partition_key
    from pt_range
    order by partition_key descending
    ;

    (you'll have to fix the code as I can't get it to do fixed format).

    —————————————————————————————————–
    | Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
    —————————————————————————————————–
    |   0 | SELECT STATEMENT            |       | 12499 | 74994 |    45   (0)| 00:00:01 |       |       |
    |   1 |  PARTITION RANGE ALL        |       | 12499 | 74994 |    45   (0)| 00:00:01 |     3 |     1 |
    |   2 |   INDEX FULL SCAN DESCENDING| PT_PK | 12499 | 74994 |    45   (0)| 00:00:01 |     3 |     1 |
    —————————————————————————————————–

    Note the pstart = 3, pstop = 1, and absence of "sort order by".
    (I've got a local index on the partition key column)

    Regards
    Jonathan Lewis
    http://jonathanlewis.wordpress.com

    Reply
  2. Jeff

    Thanks Jonathan…was hoping you might stop by!

    From your example, it suggests it's still going to read through all the partitions though…haven't got time to test this now…maybe later, but if you take out a tablespace again for say just partition 1 of that index, will Oracle be able to answer the question or not?

    If not, then it's a similar issue, that whilst it can go through the partitions in descending order, it still requires access to data that it shouldn't need in order to answer the question, i.e partition 1.

    I'll test it out later and update the blog.

    Cheers
    Jeff

    Reply
  3. jonathanlewis

    Jeff,

    The only point I wanted to make with that example is that Oracle has the technology to start from the top partition and work downwards. Here's a query that demonstrates that it can stop early:

    select *
    from (
    select id , grp
    from pt_range
    order by id desc
    )
    where rownum < = 4000
    /

    and here's the execution plan:

    ————————————————————————————————————-
    | Id  | Operation                | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
    ————————————————————————————————————-
    |   0 | SELECT STATEMENT         |          |  4000 |   101K|       |    70   (5)| 00:00:01 |       |       |
    |*  1 |  COUNT STOPKEY           |          |       |       |       |            |          |       |       |
    |   2 |   PARTITION RANGE ALL    |          | 12499 |   317K|       |    70   (5)| 00:00:01 |     3 |     1 |
    |   3 |    VIEW                  |          | 12499 |   317K|       |    70   (5)| 00:00:01 |       |       |
    |*  4 |     SORT ORDER BY STOPKEY|          | 12499 |   109K|   504K|    70   (5)| 00:00:01 |       |       |
    |   5 |      TABLE ACCESS FULL   | PT_RANGE | 12499 |   109K|       |    18   (0)| 00:00:01 |     3 |     1 |
    ————————————————————————————————————-

    Predicate Information (identified by operation id):
    —————————————————
      1 – filter(ROWNUM< =4000)
      4 – filter(ROWNUM< =4000)

    The pstart/pstop still show reverse order of parition, and all partitions; but it is easy to show that at run-time Oracle stops when it has visited enough partitions.

    In this example I had just over 4,100 rows in each partition – and the query above stopped after scanning the top partition; when I changed the predicate to "rownum < = 4200" Oracle also scanned the second partition.

    Regards
    Jonathan Lewis
    jonathanlewis.wordpress.com

    Reply
  4. Jeff

    Jonathan

    I understand your point that it has the "brains" to be able to go downwards through the partitions, either via an index scan (your first example) or a table scan (your second example) and I think we're in agreement that it would be simple enough for Oracle to code this functionality into the MAX/MIN operators – but, for whatever reason, they haven't done so yet.

    Your second example gives me an idea, again I can't test at the moment, but what about this kind of (untested!) SQL to get the MAX value of the partition key column:

    SELECT partition_key
    FROM (SELECT partition_key
    FROM partition_range_table
    WHERE partition_key IS NOT NULL
    ORDER BY partition_key DESC
    )
    WHERE ROWNUM < 2

    Will this effectively use the "brains" from your second example to short circuit the partitions and come back with the quick answer to the MAX, which is also, in this instance, the same as the first, non null, value from the highest partition with rows in it?

    When I get back to a database I'll test it out.

    Cheers
    Jeff

    Reply
  5. Jeff Moss

    Jonathan

    Thanks for letting me know you've tested it…it's certainly interesting, if a little annoying, or perhaps we should be happy that we've learned something here, yes?

    What's the protocol for suggestion of an enhancement?

    Cheers
    Jeff

    Reply
  6. jonathanlewis

    Jeff,

    Sorry for the late addition – but there's an easy way to get the walk through the partitions in descending order:

    select max(date_col)
    from partitioned_table
    where date_col le {maximum value}
    ;

    I've used "le" for "less than or equal to" to avoid problems with html; and for dates the maximum value would be '31-12-9999'.

    For numbers you could use the (10g) constant binary_float_integer probably.

    Regards
    Jonathan Lewis
    http://jonathanlewis.wordpress.com

    Reply
  7. Jeff

    …Not sure I agree…if I use the script I put in the original post, but add your last line (where date_col le 31-dec-9999) then it makes no difference…or in other words, if I take the TSP1 offline, it still thinks it needs to read from it, when it doesn't actually need to, in order to get the MAX date_col value.

    Am I misunderstanding you?

    Reply

Feel free to comment...