Tag: cbo

No pruning for MIN/MAX of partition key column

Recently, I wanted to work out the maximum value of a column on a partitioned table. The column I wanted the maximum value for, happened to be the (single and only) partition key column. The table in question was range partitioned on this single key column, into monthly partitions for 2009, with data in all the partitions behind the current date, i.e. January through mid June were populated. There were no indexes on the table.

NOTE – I tried this on 10.2.04 (AIX) and 11.1.0 (Fedora 11) – the example below is from 11.1.0.

I’ll recreate the scenario here:

CREATE TABLESPACE tsp1
datafile '/u01/app/oracle/oradata/T111/tsp1.dbf' size 100M 
autoextend off extent management local  uniform size 1m segment space management auto online
/
CREATE TABLESPACE tsp2
datafile '/u01/app/oracle/oradata/T111/tsp2.dbf' size 100M 
autoextend off extent management local  uniform size 1m segment space management auto online
/

DROP TABLE test PURGE
/
CREATE TABLE test(col_date_part_key DATE NOT NULL
,col2 VARCHAR2(2000) NOT NULL
)
PARTITION BY RANGE(col_date_part_key)
(PARTITION month_01 VALUES LESS THAN (TO_DATE(’01-FEB-2009′,’DD-MON-YYYY’)) TABLESPACE tsp1
,PARTITION month_02 VALUES LESS THAN (TO_DATE(’01-MAR-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_03 VALUES LESS THAN (TO_DATE(’01-APR-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_04 VALUES LESS THAN (TO_DATE(’01-MAY-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_05 VALUES LESS THAN (TO_DATE(’01-JUN-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_06 VALUES LESS THAN (TO_DATE(’01-JUL-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_07 VALUES LESS THAN (TO_DATE(’01-AUG-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_08 VALUES LESS THAN (TO_DATE(’01-SEP-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_09 VALUES LESS THAN (TO_DATE(’01-OCT-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_10 VALUES LESS THAN (TO_DATE(’01-NOV-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_11 VALUES LESS THAN (TO_DATE(’01-DEC-2009′,’DD-MON-YYYY’)) TABLESPACE tsp2
,PARTITION month_12 VALUES LESS THAN (TO_DATE(’01-JAN-2010′,’DD-MON-YYYY’)) TABLESPACE tsp2
)
/
REM Insert rows, but only up to 14-JUN-2009
INSERT INTO test(col_date_part_key,col2)
SELECT TO_DATE(’31-DEC-2008′,’DD-MON-YYYY’) + l
, LPAD(‘X’,2000,’X’)
FROM (SELECT level l FROM dual CONNECT BY level < 166)
/
COMMIT
/
SELECT COUNT(*)
FROM test
/
SELECT MIN(col_date_part_key) min_date
, MAX(col_date_part_key) max_date
FROM test
/

This runs and gives the following output:

DROP TABLE test PURGE                                               
           *                                                        
ERROR at line 1:                                                    
ORA-00942: table or view does not exist

DROP TABLESPACE tsp1 INCLUDING CONTENTS
*
ERROR at line 1:
ORA-00959: tablespace ‘TSP1’ does not exist

DROP TABLESPACE tsp2 INCLUDING CONTENTS
*
ERROR at line 1:
ORA-00959: tablespace ‘TSP2’ does not exist

Tablespace created.

Tablespace created.

Table created.

165 rows created.

Commit complete.

COUNT(*)
On the other hand, the herbal teas that contain Burdock extracts are specifically used for the treatment of acne, which is why it is considered as one of the most popular treatments for erectile dysfunction, sexual weakness, low stamina, check out for source purchase generic cialis low energy levels and nightfall. Brantingham performed chiropractic manipulations like graded axial elongation, mobilization of the online levitra sesamoids, adjustment of ankle and foot dysfunction, chiropractic adjustment of the first metatarsophalangeal joint, stretching exercises and lastly big toe and foot flexor strengthening exercises. These stop signs are actually directing you to come to the rescue at turbo speeds! Teen Dating Advice #5: Those parenting teens should give cialis viagra online their kids permission to blame them when they feel embarrassed about wanting out of a situation. Everyone is helpless over substances, individuals, places viagra without rx and things. ———-
165

MIN_DATE MAX_DATE
——— ———
01-JAN-09 14-JUN-09

Now, lets see what the plan looks like from AUTOTRACE when we run the following query to get the maximum value of COL_DATE_PART_KEY:

SQL> SET AUTOTRACE ON
SQL> SELECT MAX(col_date_part_key) min_date
  2  FROM   test                           
  3  /

MIN_DATE
———
14-JUN-09

Execution Plan
———————————————————-
Plan hash value: 784602781

———————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
———————————————————————————————
| 0 | SELECT STATEMENT | | 1 | 9 | 99 (0)| 00:00:02 | | |
| 1 | SORT AGGREGATE | | 1 | 9 | | | | |
| 2 | PARTITION RANGE ALL| | 132 | 1188 | 99 (0)| 00:00:02 | 1 | 12 |
| 3 | TABLE ACCESS FULL | TEST | 132 | 1188 | 99 (0)| 00:00:02 | 1 | 12 |
———————————————————————————————

Note
—–
– dynamic sampling used for this statement

Statistics
———————————————————-
0 recursive calls
0 db block gets
320 consistent gets
51 physical reads
0 redo size
527 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> SET AUTOTRACE OFF

It shows a full scan of all twelve partitions. I figured that the the plan for such a query would show a full table scan, of all partitions for that table – because, in theory, if all but the first partition were empty, then the whole table would have to be scanned to answer the query – and Oracle wouldn’t know at plan creation time, whether the data met this case, so it would have to do the full table scan to ensure the correct result.

What I thought might happen though, is that in executing the query, it would be able to short circuit things, by working through the partitions in order, from latest to earliest, and finding the first, non null, value. Once it found the first, non null, value, it would know not to continue looking in the earlier partitions, since the value of COL_DATE_PART_KEY couldn’t possibly be greater than the non null value already identified.

It doesn’t appear to have this capability, which we can check by taking one of the partitions offline and then rerunning the query, whereupon it complains that not all the data is present…

SQL> ALTER TABLESPACE tsp1 OFFLINE;

Tablespace altered.

SQL> SET AUTOTRACE ON
SQL> SELECT MAX(col_date_part_key) min_date
2 FROM test
3 /
SELECT MAX(col_date_part_key) min_date
*
ERROR at line 1:
ORA-00376: file 6 cannot be read at this time
ORA-01110: data file 6: ‘/u01/app/oracle/oradata/T111/tsp1.dbf’

SQL> SET AUTOTRACE OFF

So, even though we know we could actually answer this question accurately, Oracle can’t do it as it wants to scan, unnecessarily, the whole table.

I did find a thread which somebody had asked about this on OTN, but all the responses were about workarounds, rather than explaining why this happens (bug/feature) or how it can be made to work in the way I, or the poster of that thread, think it, perhaps, should.

Can anyone else shed any light on this? If it’s a feature, then it seems like something that could be easily coded more efficiently by Oracle. The same issue would affect both MIN and MAX since both could be
approached in the same manner.

TPC-H Query 20 and optimizer_dynamic_sampling

I was working with Jason Garforth today on creating a TPC-H benchmark script which we can run on our warehouse to initially get a baseline of performance, and then, from time to time, rerun it to ensure things are still running with a comparable performance level.

This activity was on our new warehouse platform of an IBM Power 6 p570 with 8 dual core 4.7GHz processors, 128GB RAM and a 1.6GB/Sec SAN.

Jason created a script to run the QGEN utility to generate the twenty two queries that make up the TPC-H benchmark and also a “run script” to then run those queries against the target schema I had created using some load scripts I talked about previously.

The whole process seemed to be running smoothly with queries running through in a matter of seconds, until query twenty went off the scale and started taking ages. Excluding the 20th query, everything else went through in about three to four minutes, but query twenty was going on for hours, with no sign of completing.

We grabbed the actual execution plan and noticed that all the tables involved had no stats gathered. In such circumstances, Oracle (10.2.0.4 in this instance) uses dynamic sampling to take a quick sample of the table in order to come up with an optimal plan for each query executed.

The database was running with the default value of 2 for optimizer_dynamic_sampling.

After reading the TPC-H specification, it doesn’t say that stats should or should not be gathered, but obviously in gathering them, there would be a cost to doing so and, depending on the method of gathering and the volume of the database, the cost could be considerable. It would be interesting to hear from someone who actually runs audited TPC-H benchmarks to know whether they gather table stats or whether they use dynamic sampling…

We decided we would gather the stats, just to see if the plan changed and the query executed any faster…it did, on both counts, with the query finishing very quickly, inline with the other twenty one queries in the suite.

So, our options then appeared to include, amongst other things:

    So, always read NF Cure and Vital M-40 capsules review before paying money for it, because there are too many products available in the market, which come at competitive prices to enable the people to avail these at an affordable cost. viagra for women price http://raindogscine.com/?attachment_id=300 Drink plenty of water to viagra for women price help flush away acidic waste products in the muscles can cause muscle irritation and pain. There were buy viagra online two manufacturers of chemicals which were closed with the help of authorities from both China and India. A thorough sexual history and assessment sildenafil bulk of overall health is important in pinpointing the problem.

  1. Gather the table stats. We’d proved this worked.
  2. Change the optimizer_dynamic_sampling level to a higher value and see if it made a difference.
  3. Manually, work out why the plan for the query was wrong, by analysis of the individual plan steps in further detail and then use hints or profiles to force the optimizer to “do the right thing”.

We decided to read a Full Disclosure report of a TPC-H benchmark for a similar system to see what they did. The FDR included a full listing of the init.ora of the database in that test. The listing showed that the system in question had set optimizer_dyamic_sampling to 3 instead of the default 2…we decided to try that approach and it worked perfectly.

In the end, given we’re not producing actual audited benchmarks then we’re free to wait for the gathering of optimizer stats, so we’ll go with that method, but it was interesting to see that option 2 above worked as well and illustrates the point that there is a lot of useful information to be gleaned from reading the FDRs of audited benchmarks – whilst, of course, being careful to read them with a pinch of salt, since they are not trying to run your system.

Another thing of interest was that in order to get the DBGEN utility to work on AIX 6.1 using the gcc compiler, we had to set an environment variable as follows otherwise we got an error when running DBGEN (also applies to QGEN too):

Set this:

export LDR_CNTRL=MAXDATA=0x80000000@LARGE_PAGE_DATA=Y

otherwise you may get this:

exec(): 0509-036 Cannot load program dbgen because of the following errors:
0509-026 System error: There is not enough memory available now.

ORA-07455 and EXPLAIN PLAN…and statements which, perhaps, shouldn’t run

I encountered a scenario today which I thought was strange in a number of ways…hence, irresistible to a quick blog post.

The scenario started with an end user of my warehouse emailing me a query that was returning an error message dialog box, warning the user before they ran the query, that they had insufficient resources to run said query – ORA-07455 to be precise.

I figured, either the query is one requiring significant resources – more resources than the user has access to, or the query has a suboptimal plan, whereby it thinks it will require more resources than they have access to.

To try and determine which, I logged into the same Oracle user as my end user and tried to get an explain plan of the query – so I could perhaps gauge whether there were any problems with the choice of execution path and whether the query was one which would indeed require significant resources.

The result was that it came back with the same error – which quite surprised me at first.

In using EXPLAIN PLAN, I wasn’t asking the database to actually run the query – merely to tell me what the likely execution path was for the query and yet, it appears to still do the checks on resource usage. At first, that seemed strange to me, in the sense that I wouldn’t be requiring those resources since, I’m not actually executing the statement, yet perhaps it does makes sense – or at least is consistent, because, for example, you don’t need access to all the objects in the query if you’re not going to actually execute it, yet quite rightly, the optimizer does checks as to whether you have the appropriate access permissions to each object as part of the EXPLAIN PLAN process.

That was educational point number one for me.

After logging in as another user with unlimited resource usage, I then reran the EXPLAIN PLAN and the statement was accepted and the plan returned…indicating an unpleasant rewrite of the query, and a very high anticipated cost – in excess of the limit for that end user.

That explained why the ORA-07455 was appearing for them, but highlighted an altogether different issue which perplexed me further. There follows a simple reconstruction of the query and explain plan results:

First the obligatory test script…

 

SET TIMING OFF
DROP TABLE tab1 PURGE
/
CREATE TABLE tab1
(col1 VARCHAR2(1))
/
DROP TABLE tab2 PURGE
/
CREATE TABLE tab2
(col2 VARCHAR2(1))
/
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(ownname => USER
                            ,tabname => 'TAB1'
                            );
DBMS_STATS.GATHER_TABLE_STATS(ownname => USER
                            ,tabname => 'TAB2'
                            );
END;
/
INSERT INTO tab1 VALUES('A')
/
INSERT INTO tab1 VALUES('B')
/
INSERT INTO tab1 VALUES('A')
/
INSERT INTO tab1 VALUES('B')
/
INSERT INTO tab2 VALUES('C')
/
INSERT INTO tab2 VALUES('D')
/
COMMIT
/
SET AUTOTRACE ON
SELECT *
FROM tab1
WHERE col1 IN (SELECT col1 FROM tab2)
/
SET AUTOTRACE OFF

 

Now the results…

 

Table dropped.

Connected.

Table dropped.


Table created.


Table dropped.


Table created.


PL/SQL procedure successfully completed.


1 row created.

These tablets primarily work in the same commander levitra  in a better manner. Sildenafil citrate also helps with the production of cGMP enzymes which lead for the order cheap levitra devensec.com effective promotion of the blood vessels and capillaries cuts down the blood supply towards the male genital organ. These medications amplify that signal, allowing men to function naturally. look these up levitra on line This therapy is tailed with some common side effects of testosterone and anabolic 5mg cialis tablets  steroids are: increased blood pressure, increased cholesterol, acne, hair loss, structural changes in the bile duct, pancreas, and sphincter of Oddi. 
1 row created.


1 row created.


1 row created.


1 row created.


1 row created.


Commit complete.


C
-
A
B
A
B

4 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 4220095845

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time
|
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |     2 |     4   (0)|00:00:01  |
|*  1 |  FILTER             |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL | TAB1 |     1 |     2 |     2   (0)|00:00:01  |
|*  3 |   FILTER            |      |       |       |            |          |
|   4 |    TABLE ACCESS FULL| TAB2 |     1 |       |     2   (0)|00:00:01  |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter( EXISTS (SELECT /*+ */ 0 FROM "TAB2" "TAB2" WHERE
           :B1=:B2))
3 - filter(:B1=:B2)


Statistics
----------------------------------------------------------
       1  recursive calls
       0  db block gets
      18  consistent gets
       0  physical reads
       0  redo size
     458  bytes sent via SQL*Net to client
     381  bytes received via SQL*Net from client
       2  SQL*Net roundtrips to/from client
       0  sorts (memory)
       0  sorts (disk)
       4  rows processed

 

Now, when I first saw the query I thought, hang on a minute, COL1 does not exist in table TAB2 so this query should not even execute…but it does! I don’t think it should execute personally but according to the documentation, “Oracle resolves unqualified columns in the subquery by looking in the tables named in the subquery and then in the tables named in the parent statement.“, so it is operating as described in the manuals- even if, in my view, it’s a little odd since without a rewrite, the query is incapable of executing.

The query has been rewritten with an EXISTS approach – note the first FILTER statement in the “Predicate Information” section of the autotrace output. A bit like this:

 

SELECT a.*
FROM   tab1 a
WHERE EXISTS (SELECT 0
           FROM   tab2 b
           WHERE  a.col1 = a.col1
          )
/

 

The subquery is always going to return a row, hence, for any row we select in the containing query, we will always get that row back, because the EXISTS will always find a match – it’s a bit like saying “WHERE TRUE” I guess.

Interestingly, my friend Jon first brought this scenario to my attention last week in various discussions with him and another of my colleagues, who is far more experienced than myself. To be fair, the experienced colleague is the source of a number of my blogging posts, but he’s painfully shy and will therefore remain nameless.

I was educated during that discussion, that this functionality is as advertised in the manuals – even if it doesn’t sit well with me. My closing line to my fellow debaters at the time, was that nobody would ever write SQL like that and if they did I’d tell them to rewrite it using aliases and so that it made sense – as is often the case in life though, the very next week, a real life user comes up with exactly that scenario – at least I was prepared!

Why you should skim read deep stuff first!

Seems that my recent posting about constraint generated predicates is already thoroughly covered in Jonathans latest book (Ch 6, pp 145-6)…including the bit about specifying a “NOT NULL” predicate to get around the issue with when the column in question is declared as “NULL” and not “NOT NULL”.

Doug said to me It also ensures ample blood supply to the reproductive organs to rejuvenate and boost http://icks.org/n/bbs/content.php?co_id=Contact_Us viagra without prescription your love life. You can treat your erectile dysfunction by controlling your blood pressure by getting it checked regularly; take your medications, diet and exercise on a regular basis. cialis generic viagra A few aggressive claims are being created about cheap cialis professional the strength of 15mg, 30mg and 45mgs in tablet forms. Besides, it cannot be taken on browse these guys viagra samples in canada own requirements, it need to be prescribed by an experienced doctor. recently that it was one of those books you probably should skim read first in order to get your brain used to what content is in it…so when you need to know about a particular topic you’ll (hopefully) remember that it was covered somewhere in such and such book…I think he’s probably spot on there.

Help the Cost Based Optimizer – add constraints – but beware the NULLS

If you use a function on a column then the optimizer can’t use an index right ?

Not quite.

You can of course use a Function Based index…but that’s not the subject of this post…so what else can we use in some circumstances ?

Well, I attended the Scottish Oracle User Group conference in Glasgow on Monday and enjoyed the Masterclass Jonathan Lewis gave on the CBO. After recently reading his book, the course had a degree of familiarity in terms of the slide content, but it was still a very worthwhile experience as Jonathan is a good presenter and I find it sinks in perhaps easier than just reading the book.

One of the things Jonathan said was that if you had a predicate such as this:

WHERE UPPER(col1) = ‘ABC’

…then the CBO can choose to ignore the presence of the UPPER() function if there happens to be a constraint defined on that column that can effectively substitute for that function.

I’d never heard of this so I decided to investigate…

First I created a table:

create table t1(id number
,v1 varchar2(40) null
,v2 varchar2(40) not null
,constraint t1_ck_v1 check(v1=UPPER(v1))
,constraint t1_ck_v2 check(v2=UPPER(v2))
);

Note the presence of two character columns – one NULLable and the other mandatory. I’ve added check constraints enforcing the uppercase content of both these character columns also.

…next I create indexes on these character columns:

create index t1_i1 on t1(v1);
create index t1_i2 on t1(v2);

…insert some data and analyse the table:

insert into t1(id,v1,v2)
select l
,      'THIS IS ROW: 'TO_CHAR(l)
,      'THIS IS ROW: 'TO_CHAR(l)
from   (select level l from dual connect by level<500001);

commit;

exec DBMS_STATS.GATHER_TABLE_STATS ownname=>USER,tabname=>’T1′,estimate_percent=>100,cascade=>TRUE);

 

(NOTE – The data in columns V1 and V2 is an actual value in each row, i.e. there are no NULLs. This will be important later).

…now lets turn autotrace on:

set autotrace on

…and try a query against the table using the optional column:


select * from t1
where upper(v1)=’THIS IS ROW: 1′;

…which gives us (abridged for clarity/succinctness):

ID V1              V2
-- --------------- ---------------
 1 THIS IS ROW: 1  THIS IS ROW: 1

1 row selected.

Elapsed: 00:00:00.81

Execution Plan
———————————————————-

Plan hash value: 3617692013

————————————————————————–
Id Operation Name Rows Bytes Cost (%CPU) Time
————————————————————————–
0 SELECT STATEMENT 5000 214K 789 (5) 00:00:10
* 1 TABLE ACCESS FULL T1 5000 214K 789 (5) 00:00:10
————————————————————————–

Predicate Information (identified by operation id):
—————————————————
1 – filter(UPPER(“V1”)=’THIS IS ROW: 1′)

 
You may perhaps practice side effects for instance nasal overcrowding, body pains or nervousness, however be anxious not, none of them should be a chief reason for the loss of erection of the penile region during love making sessions. levitra 20mg Agnus castus: This remedy may here are the findings purchase cialis be helpful for your dental treatments. Medicines for sleep and anti depressants – When you’re on a prescribed daily dose of sleeping canadian cialis generic medicine or an antidepressant pill and you know the rest. The viagra properien loved this regular consumption of this supplement improves the act of sexual intercourse.
As we can see, it decided that with the UPPER() function involved, a plan using the index was not possible and so chose to do a full table scan – which was not what I was expecting.

I must admit I looked at it for some time to try and understand why it wasn’t doing what Jonathan had indicated it would. I then called in my colleague Anthony, to discuss it and, after much thought, he came up with the answer that it was the definition of the V1 column being NULLable that was causing the CBO to not be able to use the index since NULLS are not stored in (B Tree) indexes and therefore, given the information at it’s disposal, the CBO deemed it impossible for the query to be answered via the index since it could, potentially, have missed a NULL value.

Given this information, I then rebuilt my test table to include the V2 column as per the above definition and then ran the query against the V2 column which was declared as NOT NULL:

select * from t1
where upper(v2)=’THIS IS ROW: 1′;

gives us:

ID V1              V2
-- --------------- ---------------
 1 THIS IS ROW: 1  THIS IS ROW: 1

1 row selected.

Elapsed: 00:00:00.03

Execution Plan
———————————————————-
Plan hash value: 965905564

————————————————————————————-
Id Operation Name Rows Bytes Cost (%CPU) Time
————————————————————————————-
0 SELECT STATEMENT 1 44 4 (0) 00:00:01
1 TABLE ACCESS BY INDEX ROWID T1 1 44 4 (0) 00:00:01
* 2 INDEX RANGE SCAN T1_I2 1 3 (0) 00:00:01
————————————————————————————-

Predicate Information (identified by operation id):
—————————————————

2 – access(“V2″=’THIS IS ROW: 1′)
filter(UPPER(“V2”)=’THIS IS ROW: 1′)

 

So, for the mandatory column, the CBO determines that the index can be used as an access path to obtain all of the relevant rows and given that it’s more efficient to do so it uses the index T1_I2 accordingly. This is what I was expecting to see in the first place…but obviously the NULLability of the V1 column had led me astray.

So, what happens if we add another predicate to the first query to try and inform the CBO that we are not looking for any NULL values – will it be clever enough to add this fact to the information from the constraint and come up with an index access path ?

select * from t1
where upper(v1)=’THIS IS ROW: 1′
and v1 is not null;

which gives us:

 

ID V1              V2
-- --------------- ---------------
 1 THIS IS ROW: 1  THIS IS ROW: 1

1 row selected.

Elapsed: 00:00:00.01

Execution Plan
———————————————————-
Plan hash value: 1429545322

————————————————————————————-
Id Operation Name Rows Bytes Cost (%CPU) Time
————————————————————————————-
0 SELECT STATEMENT 1 44 4 (0) 00:00:01
1 TABLE ACCESS BY INDEX ROWID T1 1 44 4 (0) 00:00:01
* 2 INDEX RANGE SCAN T1_I1 1 3 (0) 00:00:01
————————————————————————————-

Predicate Information (identified by operation id):
—————————————————

2 – access(“V1″=’THIS IS ROW: 1′)
filter(UPPER(“V1”)=’THIS IS ROW: 1′ AND “V1” IS NOT NULL)

So, yes, it can derive from the additional predicate stating that we are only looking for rows where V1 IS NOT NUL
L, in conjunction with the check constraint T1_CK_V1, that the UPPER() function can be ignored and that the index access path is now available and given it’s more efficient, it chooses to use it.

Quite clever really but I’m glad Anthony was around to help me see the wood for the trees on this one.

I spoke with Jonathan about this testing and he said he was aware of the need for the NOT NULL constraint in order for this to work and that from memory he thinks it was somewhere in the middle of 9.2 that this requirement came in to address a bug in transformation.

SCD2’s and their affect on the CBO

We’ve got lots of SCD2 type tables in our warehouse and I’ve been wondering about how much affect it has on the CBO. Essentially, my concern is that when you query an SCD2 you generally look for records as they were on a specific date – the analysis date as we call it. You end up writing a predicate such as:

and [analysis_date] BETWEEN from_date and to_date

Now, how many rows will the optimiser think are going to be returned from the table ?

I figured that topic might have already been investigated by somebody so I did a search on google which was interesting:

They are located in hundreds of towns and cities to ease disposal. buy cialis viagra As viagra wholesale with any other surgical procedure, bariatric surgery may present some risks. To get appropriate treatment a visit to psychiatrist in Bhopal is must. cialis prescription There are not only reason like uncomfortable chairs can affect cheap cialis you can try these out the size of your penis too? Few people are doubtful about the effects of these medicines.

What a pleasant surprise to find a reference from my own blog (via orafaq) showing as the number 1 hit…and Mark Rittman also shortly after! Unfortunately my own post was on a different matter and Mark’s was too so I was still a bit stumped…

After a bit more research I found something from Wolfgang Breitling on this subject which confirmed my thoughts and discussed it very eloquently along with other fallacies of the CBO.

Now – Wolfgang tells us here what the problem is and that there isn’t really a remedy other than using hints or stored outlines to guide the CBO…and who am I to argue!

I did think that maybe we could create some interface tables to hold all the possible range to date permutations and then when a user queries for a given analysis date they could use an equality predicate on the interface tables which would convert to pairs of from/to dates which then get equality matched to the target SCD2 – it kinda works but it means a lot of work to crunch through the interface tables just to avoid the problem of the CBO not being able to work out the selectivity/cardinality and potentially making a bad plan choice. The more the possible permutations the more work it becomes and in reality the number of permutations seems to be prohibitive so I’ve binned that idea for our environment. I might try to catch up with Wolfgang at the UKOUG to discuss this one further if I can grab his attention.

Addendum – One of Wolfgang’s suggestions in his paper is to artificially set the stats on the table to some large number so that even when it factors the number down for the probability calculation it does then the number is still large and it will consequently choose hash/merge joins over a nested loop index lookups approach. I tried this by setting the table stats on the tables in my query to be large values using:

exec dbms_stats.set_table_stats(ownname => ‘THE_SCHEMA’,tabname => ‘THE_TABLE’,numrows => 3000000000,numblks => 24000000);

This seemed to work but I still wasn’t overly keen on it since that means the optimizer is going to be coerced for any access to such tables – even if there isn’t a join required.

My colleague Tank then came up with the idea that given most of our processes run off an “analysis date” which we store in a table, we could just create a materialized view of the contents of this table and set its stats to an artificially high value for numrows/blks and then given this table is used as the driver of most queries it would propagate through the plan and the optimiser, even applying heavy reductions for the probability would still realise that there were a lot of rows to process and choose plans accordingly….it worked a treat.