Main Page
Thursday, June 25, 2009
No pruning for MIN/MAX of partition key column
NOTE - I tried this on 10.2.04 (AIX) and 11.1.0 (Fedora 11) - the example below is from 11.1.0.
I'll recreate the scenario here:
CREATE TABLESPACE tsp1
datafile '/u01/app/oracle/oradata/T111/tsp1.dbf' size 100M
autoextend off extent management local uniform size 1m segment space management auto online
/
CREATE TABLESPACE tsp2
datafile '/u01/app/oracle/oradata/T111/tsp2.dbf' size 100M
autoextend off extent management local uniform size 1m segment space management auto online
/
DROP TABLE test PURGE
/
CREATE TABLE test(col_date_part_key DATE NOT NULL
,col2 VARCHAR2(2000) NOT NULL
)
PARTITION BY RANGE(col_date_part_key)
(PARTITION month_01 VALUES LESS THAN (TO_DATE('01-FEB-2009','DD-MON-YYYY')) TABLESPACE tsp1
,PARTITION month_02 VALUES LESS THAN (TO_DATE('01-MAR-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_03 VALUES LESS THAN (TO_DATE('01-APR-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_04 VALUES LESS THAN (TO_DATE('01-MAY-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_05 VALUES LESS THAN (TO_DATE('01-JUN-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_06 VALUES LESS THAN (TO_DATE('01-JUL-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_07 VALUES LESS THAN (TO_DATE('01-AUG-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_08 VALUES LESS THAN (TO_DATE('01-SEP-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_09 VALUES LESS THAN (TO_DATE('01-OCT-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_10 VALUES LESS THAN (TO_DATE('01-NOV-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_11 VALUES LESS THAN (TO_DATE('01-DEC-2009','DD-MON-YYYY')) TABLESPACE tsp2
,PARTITION month_12 VALUES LESS THAN (TO_DATE('01-JAN-2010','DD-MON-YYYY')) TABLESPACE tsp2
)
/
REM Insert rows, but only up to 14-JUN-2009
INSERT INTO test(col_date_part_key,col2)
SELECT TO_DATE('31-DEC-2008','DD-MON-YYYY') + l
, LPAD('X',2000,'X')
FROM (SELECT level l FROM dual CONNECT BY level < 166)
/
COMMIT
/
SELECT COUNT(*)
FROM test
/
SELECT MIN(col_date_part_key) min_date
, MAX(col_date_part_key) max_date
FROM test
/
This runs and gives the following output:
DROP TABLE test PURGE
*
ERROR at line 1:
ORA-00942: table or view does not exist
DROP TABLESPACE tsp1 INCLUDING CONTENTS
*
ERROR at line 1:
ORA-00959: tablespace 'TSP1' does not exist
DROP TABLESPACE tsp2 INCLUDING CONTENTS
*
ERROR at line 1:
ORA-00959: tablespace 'TSP2' does not exist
Tablespace created.
Tablespace created.
Table created.
165 rows created.
Commit complete.
COUNT(*)
----------
165
MIN_DATE MAX_DATE
--------- ---------
01-JAN-09 14-JUN-09
Now, lets see what the plan looks like from AUTOTRACE when we run the following query to get the maximum value of COL_DATE_PART_KEY:
SQL> SET AUTOTRACE ON
SQL> SELECT MAX(col_date_part_key) min_date
2 FROM test
3 /
MIN_DATE
---------
14-JUN-09
Execution Plan
----------------------------------------------------------
Plan hash value: 784602781
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 9 | 99 (0)| 00:00:02 | | |
| 1 | SORT AGGREGATE | | 1 | 9 | | | | |
| 2 | PARTITION RANGE ALL| | 132 | 1188 | 99 (0)| 00:00:02 | 1 | 12 |
| 3 | TABLE ACCESS FULL | TEST | 132 | 1188 | 99 (0)| 00:00:02 | 1 | 12 |
---------------------------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
320 consistent gets
51 physical reads
0 redo size
527 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> SET AUTOTRACE OFF
It shows a full scan of all twelve partitions. I figured that the the plan for such a query would show a full table scan, of all partitions for that table - because, in theory, if all but the first partition were empty, then the whole table would have to be scanned to answer the query - and Oracle wouldn't know at plan creation time, whether the data met this case, so it would have to do the full table scan to ensure the correct result.
What I thought might happen though, is that in executing the query, it would be able to short circuit things, by working through the partitions in order, from latest to earliest, and finding the first, non null, value. Once it found the first, non null, value, it would know not to continue looking in the earlier partitions, since the value of COL_DATE_PART_KEY couldn't possibly be greater than the non null value already identified.
It doesn't appear to have this capability, which we can check by taking one of the partitions offline and then rerunning the query, whereupon it complains that not all the data is present...
SQL> ALTER TABLESPACE tsp1 OFFLINE;
Tablespace altered.
SQL> SET AUTOTRACE ON
SQL> SELECT MAX(col_date_part_key) min_date
2 FROM test
3 /
SELECT MAX(col_date_part_key) min_date
*
ERROR at line 1:
ORA-00376: file 6 cannot be read at this time
ORA-01110: data file 6: '/u01/app/oracle/oradata/T111/tsp1.dbf'
SQL> SET AUTOTRACE OFF
So, even though we know we could actually answer this question accurately, Oracle can't do it as it wants to scan, unnecessarily, the whole table.
I did find a thread which somebody had asked about this on OTN, but all the responses were about workarounds, rather than explaining why this happens (bug/feature) or how it can be made to work in the way I, or the poster of that thread, think it, perhaps, should.
Can anyone else shed any light on this? If it's a feature, then it seems like something that could be easily coded more efficiently by Oracle. The same issue would affect both MIN and MAX since both could be approached in the same manner.
Labels: bugs, cbo, code, dba, DW, partitioning, tuning
Thursday, October 30, 2008
TPC-H Query 20 and optimizer_dynamic_sampling
This activity was on our new warehouse platform of an IBM Power 6 p570 with 8 dual core 4.7GHz processors, 128GB RAM and a 1.6GB/Sec SAN.
Jason created a script to run the QGEN utility to generate the twenty two queries that make up the TPC-H benchmark and also a "run script" to then run those queries against the target schema I had created using some load scripts I talked about previously.
The whole process seemed to be running smoothly with queries running through in a matter of seconds, until query twenty went off the scale and started taking ages. Excluding the 20th query, everything else went through in about three to four minutes, but query twenty was going on for hours, with no sign of completing.
We grabbed the actual execution plan and noticed that all the tables involved had no stats gathered. In such circumstances, Oracle (10.2.0.4 in this instance) uses dynamic sampling to take a quick sample of the table in order to come up with an optimal plan for each query executed.
The database was running with the default value of 2 for optimizer_dynamic_sampling.
After reading the TPC-H specification, it doesn't say that stats should or should not be gathered, but obviously in gathering them, there would be a cost to doing so and, depending on the method of gathering and the volume of the database, the cost could be considerable. It would be interesting to hear from someone who actually runs audited TPC-H benchmarks to know whether they gather table stats or whether they use dynamic sampling...
We decided we would gather the stats, just to see if the plan changed and the query executed any faster...it did, on both counts, with the query finishing very quickly, inline with the other twenty one queries in the suite.
So, our options then appeared to include, amongst other things:
- Gather the table stats. We'd proved this worked.
- Change the optimizer_dynamic_sampling level to a higher value and see if it made a difference.
- Manually, work out why the plan for the query was wrong, by analysis of the individual plan steps in further detail and then use hints or profiles to force the optimizer to "do the right thing".
In the end, given we're not producing actual audited benchmarks then we're free to wait for the gathering of optimizer stats, so we'll go with that method, but it was interesting to see that option 2 above worked as well and illustrates the point that there is a lot of useful information to be gleaned from reading the FDRs of audited benchmarks - whilst, of course, being careful to read them with a pinch of salt, since they are not trying to run your system.
Another thing of interest was that in order to get the DBGEN utility to work on AIX 6.1 using the gcc compiler, we had to set an environment variable as follows otherwise we got an error when running DBGEN (also applies to QGEN too):
Set this:
export LDR_CNTRL=MAXDATA=0x80000000@LARGE_PAGE_DATA=Y
otherwise you may get this:
exec(): 0509-036 Cannot load program dbgen because of the following errors:
0509-026 System error: There is not enough memory available now.
Labels: benchmarking, cbo, DW, tuning
Wednesday, July 30, 2008
Scripts for loading up DBGEN TPC-H data
Unix Scripts:
Multi Load TPCH
Load TPCH Stream
SQL*Loader Control files:
REGION Table
NATION Table
SUPPLIER Table
CUSTOMER Table
PART Table
PARTSUPP Table
ORDERS Table
LINEITEM Table
You may wish to read and check them before you use them - they're not exactly rocket science but they seem to do the job.
I have all the files in the same directory for simplicity sake.
I then use them to create a scale factor 1 target TPC-H schema using the following calls:
# "l" loads the REGION and NATION tables
./multi_load_tpch.sh 1 l "tpch/tpch@test" 1
# "s" loads the SUPPLIER table
./multi_load_tpch.sh 1 s "tpch/tpch@test" 10
# "c" loads the CUSTOMER table
./multi_load_tpch.sh 1 c "tpch/tpch@test" 10
# "p" loads the PART and PARTSUPP tables
./multi_load_tpch.sh 1 p "tpch/tpch@test" 10
# "o" loads the ORDERS and LINEITEM tables
./multi_load_tpch.sh 1 o "tpch/tpch@test" 10
Obviously, you need to change the connection string to match your environment.
Caveats:
- Obviously, they are supplied as is - use at your own discretion and risk.
- You need to have created the target schema tables and made sure they are empty as the SQL*Loader control files use APPEND.
- Bear in mind that choosing too high a number of parallel streams (the last parameter in the calls) will overload your machine so try and balance it against the available system resources.
Bugs, issues or questions, please get in touch...enjoy.
Labels: benchmarking, code, dba, DW, tuning
Monday, July 21, 2008
Creating a TPC-H schema with DBGEN on HP-UX
Given that I also need to use this tool to help with some benchmarking at work, I decided to try to get DBGEN to run on a HP-UX box today and had one or two problems which I managed to sort out. The machine in question is an RP8420 running HP-UX B.11.11.
DBGEN is a utility that allows you to create a series of flat files which contain the data for a TPC-H schema. You can then use SQL*Loader to load these into appropriately constructed tables in an Oracle database - any database actually...but I only care about Oracle of course ;-)
The utility can be called with various parameters including making the target datasets in smaller "child" files which can be created in a manually parallelised fashion to speed the whole process up. You have to download the DBGEN reference data set from the TPC website (lower right hand side).
This reference data set contains the ANSI C source code which makes the DBGEN executable (and QGEN also...but more on that another day)...unfortunately it's just the source code, so that means you need to compile it yourself...which of course leads to the first problem...that I know diddly squat about C...yeah I know, not very manly! Luckily I can sometimes follow instructions (which come with the reference data set)...
1. Copy makefile.suite to makefile
2. Edit makefile and make the following amendments (in red):
################
## CHANGE NAME OF ANSI COMPILER HERE
################
CC = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
# SQLSERVER, SYBASE
# Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS,
# SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are: TPCH
DATABASE= ORACLE
MACHINE = HP
WORKLOAD = TPCH
That's it for the makefile.
Now, as I mentioned, the ORACLE database is not a listed database variant in the DBGEN C code - it's got all the other popular RDBMS which I find quite bizarre...I'm sure there's a reason, but I can't think of one. To get around this, as per Chapter 5 in “Pro Oracle Database 10g RAC on Linux” by Steve Shaw and Julian Dyke, I added a section to the tpcd.h for the ORACLE database:
#ifdef ORACLE
#define GEN_QUERY_PLAN ""
#define START_TRAN ""
#define END_TRAN ""
#define SET_OUTPUT ""
#define SET_ROWCOUNT ""
#define SET_DBASE ""
#endif
I thought that was it - but it still would't compile, giving the error:
config.h:213:2: #error Support for a 64-bit datatype is required in this release
Looking at the config.h - and bearing in mind I'm no C programmer - it struck me as odd that all bar the HP machine section, had stuff about DSS_HUGE and 64 bits...so I took a punt and copied some lines (in red) from the IBM section into the HP one to see if it worked...and it did. The HP section now looks like this:
#ifdef HP
#define _INCLUDE_POSIX_SOURCE
#define STDLIB_HAS_GETOPT
#define DSS_HUGE long long
#define HUGE_FORMAT "%lld"
#define HUGE_DATE_FORMAT "%02lld"
#define RNG_A 6364136223846793005ull
#define RNG_C 1ull
#endif /* HP */
My next problem was one of my own making really in that I copied the CREATE TABLE statements for the TPC-H target tables from HammerOra' TCL script for TPC-H creation, but unfortunately, the column ordering is slightly different in those DDL statements as compared to the DBGEN output files...which meant that I created the tables OK, but since I'd copied the column ordering to make the SQL*Loader control files, I got errors when I tried to load some of the files as the column order is different in one or two cases.
I then downloaded the TPC-H specification document which has, amongst other things, the data model, from which I cross checked the column ordering of the data model against the columns in the output files and then managed to rerun the data in without any further issues.
Tomorrow I'll try running HammerOra against the target TPC-H schema and make some shell scripts to try and automate most of the process so we can build different scale factor schemae and do so in a manually parallelised fashion - scripts the amiable Scotsman created for his parallel testing a while back should give me a good start with that.
Labels: benchmarking, bugs, code, dba, tuning
Thursday, April 10, 2008
Problem with _gby_hash_aggregation_enabled parameter
For three days, my colleagues in the support team on one of the warehouses I'm involved in, were struggling with a piece of code which was exhausting the available temp space and after trying everything they could think of, they asked me to take a look. I must admit I was a little baffled at first because the piece of code in question had been happily running for some time now and every time I'd run it, I'd never noticed that TEMP was anywhere near being exhausted so, whilst I could tell the process had some kind of problem, I was in the dark as to exactly what that problem was.
After trying to break down the large query into several smaller steps, I realised that it was an early step in the query that was exhibiting the problem of temp exhaustion - the step being a pivot of around 300 million measure rows into a pivot target table.
This process runs monthly and it had run without issue on all of the previous 20 occurrences and for roughly the same, if not more rows to pivot...so it didn't appear to be the volume that was causing the problem.
There had been no changes in the code itself for many months and the current version of the code had been run successfully in previous months, so it didn't appear to be a code fault.
I obtained the actual execution path for the statement whilst it was running and it looked reasonable, although looking at it, triggered a thought in my mind...what if something had changed in the database configuration?
Why would I get that thought from looking at the execution path?
Well, a while back, we had received some advice that a line in our init.ora as follows, should be changed to set the feature to TRUE instead of FALSE, so that the feature became active:
_gby_hash_aggregation_enabled = FALSE
This results in a line in the plan that reads:
HASH GROUP BY
instead of
SORT GROUP BY
The parameter was set to FALSE due to a known bug and the issues we'd seen with it, however the recent advice we'd received, indicated that the bug had been resolved at the version level we were on and that by enabling the feature - which enables GROUP BY and Aggregation using a hash scheme - we'd gain a performance boost for certain queries.
So, the DBA team researched the advice and it appeared to be the case, that the bug (4604970) which led to the disabling of the feature was fixed at our version level (10.2.0.3 on HP-UX). We duly turned on the feature in a pre production environment and ran it for a while without noticing any issues. We then enabled it in production and again, for a while, we've not noticed any issues...until now.
After a check back through the logs, it appeared that since the parameter was first enabled, the queries which were now failing, had not been run at all...they had only run prior to the parameter change...so with my suspicions aroused further, I disabled the feature at the session level and reran the process. It completed in a normal time frame and used a small amount of TEMP - hooray!
So, now we have to go back to support to try and understand if the original bug is not quite fixed or whether this is a different scenario...in any event, we're going to disable the feature for now, even though we're only getting problems with the feature on 2 processes out of perhaps thousands.
So, what's the lesson to learn?
Well, quite simply, that you need to have a thorough audit of what configuration changes you've made together with a good audit of the processes you've run so that you can work out what has changed since the last time you successfully ran a process. This gives you a fighting chance of spotting things like the above.
Tuesday, September 25, 2007
CALIBRATE_IO Poll
If you'd like to contribute please visit using the following link:
Oracle 11g CALIBRATE_IO Results
No prizes for the "biggest"...Kevin is bound to have something in his lab that beats us all! ;-)
Thanks
11g IO Calibration tool
Obviously, any experiment that goes flawlessly according to a preconceived plan is:
1. Boring
2. Less educational
3. Not normally one I've done - mine always have problems it seems!
I ran the calibration on my VMWare based OpenSuse 10 linux with Oracle 11g and it immediately came up with a problem:
SQL> @io
SQL> SET SERVEROUTPUT ON
SQL> DECLARE
2 lat INTEGER;
3 iops INTEGER;
4 mbps INTEGER;
5 BEGIN
6 -- DBMS_RESOURCE_MANAGER.CALIBRATE_IO (, , iops, mbps, lat);
7 DBMS_RESOURCE_MANAGER.CALIBRATE_IO (2, 10, iops, mbps, lat);
8
9 DBMS_OUTPUT.PUT_LINE ('max_iops = ' || iops);
10 DBMS_OUTPUT.PUT_LINE ('latency = ' || lat);
11 dbms_output.put_line('max_mbps = ' || mbps);
12 end;
13 /
DECLARE
*
ERROR at line 1:
ORA-56708: Could not find any datafiles with asynchronous i/o capability
ORA-06512: at "SYS.DBMS_RMIN", line 453
ORA-06512: at "SYS.DBMS_RESOURCE_MANAGER", line 1153
ORA-06512: at line 7
Of course, consulting the manual led me to run this query:
SELECT name, asynch_io
FROM v$datafile f,v$iostat_file i
WHERE f.file# = i.file_no
AND filetype_name = 'Data File'
/
which gave:
SQL> /
NAME ASYNCH_IO
-------------------------------------------------- ---------
/home/oracle/oradata/su11/system01.dbf ASYNC_OFF
/home/oracle/oradata/su11/sysaux01.dbf ASYNC_OFF
/home/oracle/oradata/su11/undotbs01.dbf ASYNC_OFF
/home/oracle/oradata/su11/users01.dbf ASYNC_OFF
/home/oracle/oradata/su11/example01.dbf ASYNC_OFF
...or in other words no asynchronous IO available - as the error message had said.
After altering the filesystemio_options parameter to "set_all" and bouncing the instance, a second run of the calibration process seemed to work fine...
SQL> @io
SQL> SET ECHO ON
SQL> SET SERVEROUTPUT ON
SQL> DECLARE
2 lat INTEGER;
3 iops INTEGER;
4 mbps INTEGER;
5 BEGIN
6 -- DBMS_RESOURCE_MANAGER.CALIBRATE_IO (, , iops, mbps, lat);
7 DBMS_RESOURCE_MANAGER.CALIBRATE_IO (2, 10, iops, mbps, lat);
8
9 DBMS_OUTPUT.PUT_LINE ('max_iops = ' || iops);
10 DBMS_OUTPUT.PUT_LINE ('latency = ' || lat);
11 dbms_output.put_line('max_mbps = ' || mbps);
12 end;
13 /
max_iops = 72
latency = 13
max_mbps = 26
PL/SQL procedure successfully completed.
So,my figures are considerably lower than those Doug achieved:
max_iops = 112
latency = 8
max_mbps = 32
...but not too bad I guess considering the fact that mine is a VM and the hardware I'm running is more humble...no seriously, size does not matter!
Tuesday, September 04, 2007
11g PX tracefiles now have the tracefile identifier on them
In 10gR2 (10.2.0.2.0 specifically) the trace files would come out with names in this format:
<instance name>_<PX Slave identifier>_<background process ID>.trc
e.g. fred_p001_6789.trc
In 11gR1 (11.1.0.6.0 specifically) the trace files come out with names in this format:
<instance name>_<PX Slave identifier>_<background process ID>_<Trace file identifier>.trc
e.g. fred_p001_5678_jeff.trc
This assumes you've set the tracefile identifier in the first place, otherwise that bit won't be present. Use the following to set it, choosing whatever identifier you require of course:
alter session set tracefile_identifier='jeff';
It was interesting that the location of such files has also changed due to the implementation of Automatic Diagnostic Repository (ADR). More information on that here.
Monday, August 13, 2007
ORA-07455 and EXPLAIN PLAN...and statements which, perhaps, shouldn't run
I encountered a scenario today which I thought was strange in a number of ways...hence, irresistible to a quick blog post.
The scenario started with an end user of my warehouse emailing me a query that was returning an error message dialog box, warning the user before they ran the query, that they had insufficient resources to run said query - ORA-07455 to be precise.
I figured, either the query is one requiring significant resources - more resources than the user has access to, or the query has a suboptimal plan, whereby it thinks it will require more resources than they have access to.
To try and determine which, I logged into the same Oracle user as my end user and tried to get an explain plan of the query - so I could perhaps gauge whether there were any problems with the choice of execution path and whether the query was one which would indeed require significant resources.
The result was that it came back with the same error - which quite surprised me at first.
In using EXPLAIN PLAN, I wasn't asking the database to actually run the query - merely to tell me what the likely execution path was for the query and yet, it appears to still do the checks on resource usage. At first, that seemed strange to me, in the sense that I wouldn't be requiring those resources since, I'm not actually executing the statement, yet perhaps it does makes sense - or at least is consistent, because, for example, you don't need access to all the objects in the query if you're not going to actually execute it, yet quite rightly, the optimizer does checks as to whether you have the appropriate access permissions to each object as part of the EXPLAIN PLAN process.
That was educational point number one for me.
After logging in as another user with unlimited resource usage, I then reran the EXPLAIN PLAN and the statement was accepted and the plan returned...indicating an unpleasant rewrite of the query, and a very high anticipated cost - in excess of the limit for that end user.
That explained why the ORA-07455 was appearing for them, but highlighted an altogether different issue which perplexed me further. There follows a simple reconstruction of the query and explain plan results:
First the obligatory test script...
SET TIMING OFF
DROP TABLE tab1 PURGE
/
CREATE TABLE tab1
(col1 VARCHAR2(1))
/
DROP TABLE tab2 PURGE
/
CREATE TABLE tab2
(col2 VARCHAR2(1))
/
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(ownname => USER
,tabname => 'TAB1'
);
DBMS_STATS.GATHER_TABLE_STATS(ownname => USER
,tabname => 'TAB2'
);
END;
/
INSERT INTO tab1 VALUES('A')
/
INSERT INTO tab1 VALUES('B')
/
INSERT INTO tab1 VALUES('A')
/
INSERT INTO tab1 VALUES('B')
/
INSERT INTO tab2 VALUES('C')
/
INSERT INTO tab2 VALUES('D')
/
COMMIT
/
SET AUTOTRACE ON
SELECT *
FROM tab1
WHERE col1 IN (SELECT col1 FROM tab2)
/
SET AUTOTRACE OFF
Now the results...
Table dropped.
Connected.
Table dropped.
Table created.
Table dropped.
Table created.
PL/SQL procedure successfully completed.
1 row created.
1 row created.
1 row created.
1 row created.
1 row created.
1 row created.
Commit complete.
C
-
A
B
A
B
4 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 4220095845
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
|
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 | 4 (0)|00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | TAB1 | 1 | 2 | 2 (0)|00:00:01 |
|* 3 | FILTER | | | | | |
| 4 | TABLE ACCESS FULL| TAB2 | 1 | | 2 (0)|00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT /*+ */ 0 FROM "TAB2" "TAB2" WHERE
:B1=:B2))
3 - filter(:B1=:B2)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
18 consistent gets
0 physical reads
0 redo size
458 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4 rows processed
Now, when I first saw the query I thought, hang on a minute, COL1 does not exist in table TAB2 so this query should not even execute...but it does! I don't think it should execute personally but according to the documentation, "Oracle resolves unqualified columns in the subquery by looking in the tables named in the subquery and then in the tables named in the parent statement.", so it is operating as described in the manuals- even if, in my view, it's a little odd since without a rewrite, the query is incapable of executing.
The query has been rewritten with an EXISTS approach - note the first FILTER statement in the "Predicate Information" section of the autotrace output. A bit like this:
SELECT a.*
FROM tab1 a
WHERE EXISTS (SELECT 0
FROM tab2 b
WHERE a.col1 = a.col1
)
/
The subquery is always going to return a row, hence, for any row we select in the containing query, we will always get that row back, because the EXISTS will always find a match - it's a bit like saying "WHERE TRUE" I guess.
Interestingly, my friend Jon first brought this scenario to my attention last week in various discussions with him and another of my colleagues, who is far more experienced than myself. To be fair, the experienced colleague is the source of a number of my blogging posts, but he's painfully shy and will therefore remain nameless.
I was educated during that discussion, that this functionality is as advertised in the manuals - even if it doesn't sit well with me. My closing line to my fellow debaters at the time, was that nobody would ever write SQL like that and if they did I'd tell them to rewrite it using aliases and so that it made sense - as is often the case in life though, the very next week, a real life user comes up with exactly that scenario - at least I was prepared!
Labels: cbo, code, errors, tuning
Monday, July 16, 2007
DBA_SEGMENTS misleading PARTITION_NAME column
Whilst writing some code that referenced the DBA_SEGMENTS dictionary view today, I realised that the contents of the PARTITION_NAME column actually contains the name of the subpartition when the table is a subpartitioned table...a script and results to illustrate:
drop table jeff_unpartitioned purge
/
drop table jeff_partitioned purge
/
drop table jeff_subpartitioned purge
/
create table jeff_unpartitioned(col1 number,col2 number)
/
create table jeff_partitioned(col1 number,col2 number)
partition by range(col1)
(partition p1 values less than(100)
,partition p2 values less than(maxvalue)
)
/
create table jeff_subpartitioned(col1 number,col2 number)
partition by range(col1)
subpartition by list(col2)
subpartition template
(subpartition sub1 values(1)
,subpartition sub2 values(2)
)
(partition p1 values less than(100)
,partition p2 values less than(maxvalue)
)
/
select segment_name,partition_name,segment_type
from user_segments
where segment_name like 'JEFF%'
order by segment_name
/
...gives the following results...
Table dropped.
Table dropped.
Table dropped.
Table created.
Table created.
Table created.
Partition
SEGMENT_NAME Name SEGMENT_TYPE
------------------------ ------------------
JEFF_PARTITIONED P2 TABLE PARTITION
JEFF_PARTITIONED P1 TABLE PARTITION
JEFF_SUBPARTITIONED P1_SUB1 TABLE SUBPARTITION
JEFF_SUBPARTITIONED P2_SUB2 TABLE SUBPARTITION
JEFF_SUBPARTITIONED P1_SUB2 TABLE SUBPARTITION
JEFF_SUBPARTITIONED P2_SUB1 TABLE SUBPARTITION
JEFF_UNPARTITIONED TABLE
7 rows selected.
As you can see, the subpartitioned table shows the subpartition name in the PARTITION_NAME column. It's not a big deal and I only noticed because I assumed there would be a SUBPARTITION_NAME column whilst I was writing my code...the failure in compilation led me to track this slightly erroneous situation down.
Why does this occur?
Well, if you delve into the code behind DBA_SEGMENTS you'll see it refers to another view in the SYS schema called SYS_DBA_SEGS. The SQL behind SYS_DBA_SEGS selects all levels (Table, Partition and subpartitions) from the SYS.OBJ$ view, but then "loses" the partitions when joining to SYS.SYS_OBJECTS (the OBJ# from SYS.OBJ$ does not map to any row in SYS.SYS_OBJECTS via the OBJECT_ID column). The SQL then "loses" the table when joining to SYS.SEG$ - exactly why it does this I don't fully understand, but I'm guessing it's because those components of the composite object don't actually have their own segment to physically store anything in since there are lower levels - in this case the subpartitions.
In any event, it's a little bit of a gotcha and the column could probably do with being renamed to SUB_SEGMENT_NAME perhaps.
Friday, June 29, 2007
Using partitioning to colocate data for optimal multiple index access paths
The table in question had about five million rows and about four indexes. For the process in question, the table was hitting each of the indexes on this single table at one point or another.
I noticed that the clustering factor for each of the indexes was quite poor - a value closer to the number of rows in the table rather than the number of blocks in the table. This was leading to a high number of logical IOs to get the rows needed on each query in the process, because the rows pointed to by the index were all in different data blocks, scattered far and wide. As I recalle, Jonathan gives a great explanation of how the clustering factor can affect the performance of a query in Chapter five of his CBO Fundamentals book and basically this was relevant to this scenario.
A fix for a poor clustering factor, is to sort the data in the table, in the order which matches that of the index in question. So, if INDEX_A is on columns A then B, you should order the data in the table by column A, then B...perhaps by removing it and then reinserting it all using an ORDER BY on the insert.
This of course, only works for one order - you can't order data in multiple different ways within the same table, to suit all the indexes, so for one index you might be able to order the data perfectly but that might be terrible for another index which wants to order by say, column C, then D....or does it?
Well, in this particular case, the data required by each query hitting different indexes on this single table was all of a different type - by this I mean, the table itself was modelled from a Supertype Entity, implemented physically as a single table with a Supertype differentiator column (PRODUCT_TYPE in this case). Query one in the process wanted data for PRODUCT_TYPE "X", and wanted to get this via Index IDX1 on COL1. Query two would then want data of PRODUCT_TYPE "Y" and get it via index IDX2 on COL2 etc...
I figured that if we split the table into partitions based on the PRODUCT_TYPE then we'd be able to order the data within each partition by the appropriate order for the index which would commonly be used to access data of that PRODUCT_TYPE.
It will take them a while to test and implement the solution and gain some empirical results, however a simple test script I knocked up seemed to indicate that the solution could work:
The script...
CLEAR BREAKS
CLEAR COLUMNS
COLUMN index_name HEADING "IndexName" FORMAT A10
COLUMN partition_name HEADING "PartitionName" FORMAT A10
COLUMN leaf_blocks HEADING "LeafBlocks" FORMAT 999
COLUMN distinct_keys HEADING "DistinctKeys" FORMAT 999,999
COLUMN num_rows HEADING "NumRows" FORMAT 999,999
COLUMN clustering_factor HEADING "ClusteringFactor" FORMAT 999,999
COLUMN clfpct HEADING "Clu FactPercent" FORMAT 999
DROP TABLE jeff_unpartitioned PURGE
/
CREATE TABLE jeff_unpartitioned
AS
SELECT l pk_col
, MOD(l,4) partitioning_key_col
, (dbms_random.value * 1000) index_col1
, (dbms_random.value * 1000) index_col2
, (dbms_random.value * 1000) index_col3
, (dbms_random.value * 1000) index_col4
FROM (SELECT level l FROM dual CONNECT BY LEVEL < 100001)
ORDER BY l
/
CREATE UNIQUE INDEX ju_pk ON jeff_unpartitioned(pk_col)
/
CREATE INDEX ju_idx1 ON jeff_unpartitioned(index_col1)
/
CREATE INDEX ju_idx2 ON jeff_unpartitioned(index_col2)
/
CREATE INDEX ju_idx3 ON jeff_unpartitioned(index_col3)
/
CREATE INDEX ju_idx4 ON jeff_unpartitioned(index_col4)
/
exec DBMS_STATS.GATHER_TABLE_STATS (ownname=>USER,tabname=>'JEFF_UNPARTITIONED');
SELECT i.index_name
, i.leaf_blocks
, i.distinct_keys
, i.clustering_factor
, (DECODE((t.num_rows - t.blocks),0,0,(i.clustering_factor) / (t.num_rows - t.blocks))) * 100 clfpct
FROM all_indexes i
, all_tables t
WHERE i.table_name = t.table_name
AND i.table_owner = t.owner
AND t.table_name = 'JEFF_UNPARTITIONED'
AND t.owner = USER
/
DROP TABLE jeff_partitioned PURGE
/
CREATE TABLE jeff_partitioned
(pk_col number
,partitioning_key_col number
,index_col1 number
,index_col2 number
,index_col3 number
,index_col4 number
)
PARTITION BY LIST(partitioning_key_col)
(PARTITION p1 VALUES (1)
,PARTITION p2 VALUES (2)
,PARTITION p3 VALUES (3)
,PARTITION p4 VALUES (4)
)
/
INSERT /*+ APPEND */
INTO jeff_partitioned(pk_col,partitioning_key_col,index_col1,index_col2,index_col3,index_col4)
SELECT l
, 1
, ROUND(l,100)
, (dbms_random.value * 1000)
, (dbms_random.value * 1000)
, (dbms_random.value * 1000)
FROM (SELECT level l FROM dual CONNECT BY LEVEL < 25001)
ORDER BY 3
/
COMMIT
/
INSERT /*+ APPEND */
INTO jeff_partitioned(pk_col,partitioning_key_col,index_col1,index_col2,index_col3,index_col4)
SELECT l
, 2
, (dbms_random.value * 1000)
, ROUND(l,100)
, (dbms_random.value * 1000)
, (dbms_random.value * 1000)
FROM (SELECT level l FROM dual CONNECT BY LEVEL < 25001)
ORDER BY 4
/
COMMIT
/
INSERT /*+ APPEND */
INTO jeff_partitioned(pk_col,partitioning_key_col,index_col1,index_col2,index_col3,index_col4)
SELECT l
, 3
, (dbms_random.value * 1000)
, (dbms_random.value * 1000)
, ROUND(l,100)
, (dbms_random.value * 1000)
FROM (SELECT level l FROM dual CONNECT BY LEVEL < 25001)
ORDER BY 5
/
COMMIT
/
INSERT /*+ APPEND */
INTO jeff_partitioned(pk_col,partitioning_key_col,index_col1,index_col2,index_col3,index_col4)
SELECT l
, 4
, (dbms_random.value * 1000)
, (dbms_random.value * 1000)
, (dbms_random.value * 1000)
, ROUND(l,100)
FROM (SELECT level l FROM dual CONNECT BY LEVEL < 25001)
ORDER BY 6
/
COMMIT
/
CREATE UNIQUE INDEX jp_pk ON jeff_partitioned(pk_col,partitioning_key_col) LOCAL
/
CREATE INDEX jp_idx1 ON jeff_partitioned(index_col1,partitioning_key_col) LOCAL
/
CREATE INDEX jp_idx2 ON jeff_partitioned(index_col2,partitioning_key_col) LOCAL
/
CREATE INDEX jp_idx3 ON jeff_partitioned(index_col3,partitioning_key_col) LOCAL
/
CREATE INDEX jp_idx4 ON jeff_partitioned(index_col4,partitioning_key_col) LOCAL
/
exec DBMS_STATS.GATHER_TABLE_STATS (ownname=>USER,tabname=>'JEFF_PARTITIONED');
SELECT i.index_name
, i.leaf_blocks
, i.distinct_keys
, i.clustering_factor
, (DECODE((t.num_rows - t.blocks),0,0,(i.clustering_factor) / (t.num_rows - t.blocks))) * 100 clfpct
FROM all_indexes i
, all_tables t
WHERE i.table_name = t.table_name
AND i.table_owner = t.owner
AND t.table_name = 'JEFF_PARTITIONED'
AND t.owner = USER
ORDER BY i.index_name
/
SELECT i.index_name
, ip.partition_name
, ip.leaf_blocks
, ip.distinct_keys
, ip.clustering_factor
, (DECODE((t.num_rows - t.blocks),0,0,(ip.clustering_factor) / (t.num_rows - t.blocks))) * 100 clfpct
FROM all_indexes i
, all_ind_partitions ip
, all_tables t
WHERE i.table_name = t.table_name
AND i.table_owner = t.owner
AND i.index_name = ip.index_name
AND i.owner = ip.index_owner
AND t.table_name = 'JEFF_PARTITIONED'
AND t.owner = USER
ORDER BY i.index_name
, ip.partition_name
/
And now the results...
Table dropped.
Table created.
Index created.
Index created.
Index created.
Index created.
Index created.
PL/SQL procedure successfully completed.
Index Leaf Distinct Clustering Clu Fact
Name Blocks Keys Factor Percent
---------- ------ -------- ---------- --------
JU_IDX4 226 100,000 99,830 101
JU_IDX3 226 100,000 99,880 101
JU_IDX2 226 100,000 99,872 101
JU_IDX1 226 100,000 99,833 101
JU_PK 103 100,000 685 1
5 rows selected.
Table dropped.
Table created.
25000 rows created.
Commit complete.
25000 rows created.
Commit complete.
25000 rows created.
Commit complete.
25000 rows created.
Commit complete.
Index created.
Index created.
Index created.
Index created.
Index created.
PL/SQL procedure successfully completed.
Index Leaf Distinct Clustering Clu Fact
Name Blocks Keys Factor Percent
---------- ------ -------- ---------- --------
JP_IDX1 218 100,000 74,593 75
JP_IDX2 218 100,000 74,624 75
JP_IDX3 218 100,000 74,609 75
JP_IDX4 218 100,000 74,627 75
JP_PK 124 100,000 568 1
5 rows selected.
Index Partition Leaf Distinct Clustering Clu Fact
Name Name Blocks Keys Factor Percent
---------- ---------- ------ -------- ---------- --------
JP_IDX1 P1 32 25,000 142 0
JP_IDX1 P2 62 25,000 24,826 25
JP_IDX1 P3 62 25,000 24,811 25
JP_IDX1 P4 62 25,000 24,814 25
JP_IDX2 P1 62 25,000 24,812 25
JP_IDX2 P2 32 25,000 142 0
JP_IDX2 P3 62 25,000 24,835 25
JP_IDX2 P4 62 25,000 24,835 25
JP_IDX3 P1 62 25,000 24,802 25
JP_IDX3 P2 62 25,000 24,826 25
JP_IDX3 P3 32 25,000 142 0
JP_IDX3 P4 62 25,000 24,839 25
JP_IDX4 P1 62 25,000 24,830 25
JP_IDX4 P2 62 25,000 24,832 25
JP_IDX4 P3 62 25,000 24,823 25
JP_IDX4 P4 32 25,000 142 0
JP_PK P1 31 25,000 142 0
JP_PK P2 31 25,000 142 0
JP_PK P3 31 25,000 142 0
JP_PK P4 31 25,000 142 0
20 rows selected.
From the results we can see that when the table is unpartitioned and, in this instance, ordered by the primary key, all four indexes have high clustering factors.
We could order the data by one of the columns in one of the indexes and make that index get a good (low) clustering factor, but the other indexes would still remain with high values - you can't make them all have low clustering factors whilst the table is one big amorphous mass of data.
In the partitioned table we are able to order the data in the most efficient way for each partition, insofar as each partition maps to an access path via a specific index.
So, partition P1, where users would commonly access the data via index JP_IDX1, has a low clustering factor because we ordered the data by INDEX_COL1 on insert.
Partition P2 has the lowest clustering factor for index JP_IDX2 and P3 for JP_IDX3 etc...
Some caveats to bear in mind for this approach:
1. You should ensure that your queries do use the partitioning key column when accessing the table, otherwise you'll end up scanning extra index partitions unnecessarily. You will also make life extremely difficult, if not impossible, for the optimizer because if we don't identify that the query will only use one partition then the CBO will be forced to use the global stats on the index/table rather than the specific ones for a single partition...the global ones as you can see from the results still show a high clustering factor because it is an aggregate which is hiding the detail that one specific partition - the one we really want - has a low clustering factor.
2. If you want LOCAL indexes, you will need to add the partitioning key column to that index - if it is not already present of course. As Jonathan points out in the first comment on this post...this only applies if the index in question is unique and supports a PK/unique constraint. If the index does not support a PK/unique constraint then it can be LOCAL without the need to have the partitioning key column(s) in it.
3. If you are using GLOBAL indexes on the partitioned table then they will still have reasonably high clustering factors due to the partitions where the data is not ordered by the columns of that index, but the data itself will be ordered within the partition that we access via the partioning key column so we should get good performance.
4. If you want to use ordered data in a table then you need to factor that into the way you populate the table - if it's a batch process then you can this as part of that process but if it's built up over time, in an OLTP fashion then you'll need periodic "rebuilds" of the table to take the data out and put it back in, reordered. This is something you would need to do whether it was partitioned of course.
I'm sure there are more caveats.
Your mileage may, of course, vary.
Labels: tuning
Tuesday, February 13, 2007
Changing the subpartition template and pre-existing partitions
Lets try and work through the problem...first lets create some test tables...
DROP TABLE j4134_test1 PURGE
/
CREATE TABLE j4134_test1
(
col1 NUMBER NOT NULL
,col2 NUMBER NOT NULL
,col3 VARCHAR2(254) NOT NULL
)
PARTITION BY RANGE (col1)
SUBPARTITION BY LIST(col2)
SUBPARTITION TEMPLATE
(SUBPARTITION s1 VALUES (1),
SUBPARTITION s2 VALUES (2),
SUBPARTITION s3 VALUES (3)
)
(
PARTITION p1 VALUES LESS THAN (1000)
,PARTITION p2 VALUES LESS THAN (2000)
,PARTITION p3 VALUES LESS THAN (3000)
,PARTITION p4 VALUES LESS THAN (4000)
)
/
ALTER TABLE j4134_test1 SET SUBPARTITION TEMPLATE
(SUBPARTITION s1 VALUES (1),
SUBPARTITION s2 VALUES (2)
)
/
ALTER TABLE j4134_test1 ADD PARTITION p5 VALUES LESS THAN(5000)
/
DROP TABLE j4134_test2 PURGE
/
CREATE TABLE j4134_test2 (
col1 NUMBER NOT NULL
,col2 NUMBER NOT NULL
,col3 VARCHAR2(254) NOT NULL
)
PARTITION BY LIST (col2)
(
PARTITION s1 VALUES (1)
,PARTITION s2 VALUES (2)
,PARTITION s3 VALUES (3)
)
/
OK, that's everything set up...now lets just have a look at what partitions and subpartitions we have:
ae_aml[522/313]@AED52> SELECT partition_name
2 , subpartition_name
3 FROM user_tab_subpartitions
4 WHERE table_name='J4134_TEST1'
5 /
Partition Sub Part
Name Name
-------------------- ------------
P1 P1_S1
P1 P1_S2
P1 P1_S3
P2 P2_S1
P2 P2_S2
P2 P2_S3
P3 P3_S1
P3 P3_S2
P3 P3_S3
P4 P4_S1
P4 P4_S2
P4 P4_S3
P5 P5_S1
P5 P5_S2
14 rows selected.
Notice that Partition P5 has only two subpartitions whilst the other partitions all have three subpartitions.
Now lets do a couple of tests...
First lets try and replicate the original problem...
ae_aml[522/313]@AED52> SET ECHO ON
ae_aml[522/313]@AED52> INSERT INTO j4134_test1(col1,col2,col3)
2 VALUES(3500,3,'TEST')
3 /
1 row created.
This worked - because the subpartition template used for the Partition P4 where the COL1 value 3500 would be inserted, included a subpartition for COL2 with values of 3 - no problem.
Lets try again...
ae_aml[522/313]@AED52> INSERT INTO j4134_test1(col1,col2,col3)
2 VALUES(4500,3,'TEST')
3 /
INSERT INTO j4134_test1(col1,col2,col3)
*
ERROR at line 1:
ORA-14400: inserted partition key does not map to any partition
Aha - now it fails...because the COL1 value of 4500 would result in Partition P5 being used - but this was created after the subpartition template was changed to only have subpartitions for values 1 and 2 - the value of 3 for COL2 does not have a "home" to go to, so the statement fails.
This is interesting because it means that Oracle allows us to have Partitions setup with different subpartition layouts under them depending on what the template was at the time of partition creation.
When I first thought about this I figured it wasn't possible/correct to do this and that the subpartitions had to be uniform across the partitions, but when you think about that, it would make life difficult to change the template - what should Oracle do with the "missing" or "extra" or "modified"
subpartitions in the pre-existing partitions? Create/drop/modify them? Ignore them? If it tried to create missing ones there would be all sorts of questions like which tablespace to use and what storage characteristics to set.
As the documentation says, it leaves what exists already as is and just uses the new template for partitions created in the future...which brings me to another point...that piece of code I put up on this post needed updating because I didn't know about this subpartition template issue...now the code will check to ensure that if we are exchanging a partitioned table for a partition of a composite partitioned table that we need to ensure the source table is partitioned in the same way
(LIST/HASH) as the subpartitioning on the target partition and that there are the same number of partitions in the source table and the target table partition - if not the process fails...like this:
ae_aml[522/313]@AED52> ALTER TABLE j4134_test1 EXCHANGE PARTITION p1 WITH TABLE j4134_test2
2 /
Table altered.
ae_aml[522/313]@AED52> REM Now put it back...
ae_aml[522/313]@AED52> ALTER TABLE j4134_test1 EXCHANGE PARTITION p1 WITH TABLE j4134_test2
2 /
Table altered.
ae_aml[522/313]@AED52> REM now try with the mismatched one - fails!
ae_aml[522/313]@AED52> ALTER TABLE j4134_test1 EXCHANGE PARTITION p5 WITH TABLE j4134_test2
2 /
ALTER TABLE j4134_test1 EXCHANGE PARTITION p5 WITH TABLE j4134_test2
* ERROR at line 1:
ORA-14294: Number of partitions does not match number of subpartitions
I've updated the code to check for this scenario and it now reports it.
Labels: errors, partitioning, tuning
Thursday, February 08, 2007
Using AWR to summarise SQL operations activity
I didn't have an answer as to where to get this information so I figured it warranted a ramble through the AWR tables to try and come up with something...
This isn't going to be a description of how the AWR works - there are manuals for that. Instead, I'll just give you the salient points from my exploration.
The AWR data, including it's history, is accessible by looking at a collection of DBA Views named DBA_HIST%.
Part of the AWR processing captures Active Session History (ASH) data which is then accessible via the DBA_HIST_ACTIVE_SESS_HISTORY view. From this view we can obtain, for any given snap point in time, the sessions which were active, together with details of what they were doing and the statistics relating to the performance of that activity.
On this view is a column called SQL_OPCODE which tells us the type of SQL operation being performed by the session, with various codes indicating different things. I didn't manage to find a definitive list of these SQL_OPCODES in a table/view anywhere so, by trial and error I worked them out as follows:
SQL_OPCODE SQL Operation
1 DDL
2 INSERT
3 Query
6 UPDATE
7 DELETE
47 PL/SQL package call
50 Explain Plan
189 MERGE
...I'm sure this isn't definitive...someone really smart out there will probably know where this list is actually stored and will, perhaps, put it on a comment to this post ;-)
OK, now to code up something that goes back over this AWR data for the last, say seven days, and group it up by SQL operation so we can see what our activity has been - I used a ROW_NUMBER analytic (cos they rock) to make sure we only get the latest snapshot row from the AWR table - if a SQL operation is running for a long time then it may appear in the AWR view more than once...and we don't want to double count it as far as executions go.
I've grouped up those sessions which don't have an active SQL statement associated with them (i.e. SQL_ID is NULL) into groups based on their Object Type and I've then used ROLLUP to get the results for each SQL operation by day and also the summed up totals by operation across the seven days and a grand total.
I'm not actually sure why some rows in DBA_HIST_ACTIVE_SESS_HISTORY don't have a SQL_ID against them - if they were not actually running a statement in the session then that sounds fair enough...but some of them had values in PLSQL_ENTRY_OBJECT_ID indicating (I think) that they were running a specific package (DBMS_STATS or DBMS_SPACE in many cases) so the fact they didn't have a SQL_ID was confusing to me - perhaps it means they're in that package but not actually running a query at the time of the snap - either way, they're grouped up separately.
Here is a SQL*Plus spool of the code and results from a test system:
x_j4134[543/9757]@AED52> l
1 WITH ash AS
2 (
3 SELECT TRUNC(ash.sample_time) sample_day
4 , (CASE WHEN ash.sql_opcode = 47
5 THEN 'PL/SQL'
6 WHEN ash.sql_opcode IN(1)
7 THEN 'DDL'
8 WHEN ash.sql_opcode IN(2,6,7,189)
9 THEN 'DML'
10 WHEN ash.sql_opcode IN(50)
11 THEN 'Explain Plan'
12 WHEN ash.sql_opcode IN(3)
13 THEN 'Query'
14 ELSE 'No Statement ID; In object type: 'NVL(o.object_type,'Not Specified')
15 END) statement_type
16 , ROW_NUMBER() OVER(PARTITION BY ash.sql_id,ash.sql_child_number ORDER BY ash.sample_time DESC) rn
17 FROM dba_hist_snapshot s
18 , dba_hist_active_sess_history ash
19 , dba_objects o
20 WHERE s.snap_id = ash.snap_id(+)
21 AND s.dbid = ash.dbid(+)
22 AND s.instance_number = ash.instance_number(+)
23 AND ash.plsql_entry_object_id = o.object_id(+)
24 AND TRUNC(ash.sample_time) BETWEEN TRUNC(SYSDATE-6) AND TRUNC(SYSDATE+1) -- all within last 7 days
25 )
26 SELECT sample_day
27 , statement_type
28 , COUNT(1)
29 FROM ash
30 WHERE rn = 1
31 GROUP BY ROLLUP(sample_day)
32 , ROLLUP(statement_type)
33 ORDER BY sample_day
34* , statement_type
x_j4134[543/9757]@AED52> /
SAMPLE_DAY STATEMENT_TYPE COUNT(1)
-------------------- ---------------------------------------------------- ----------
02-FEB-2007 00:00:00 DDL 112
02-FEB-2007 00:00:00 DML 49
02-FEB-2007 00:00:00 No Statement ID; In object type: Not Specified 292
02-FEB-2007 00:00:00 No Statement ID; In object type: PACKAGE 7
02-FEB-2007 00:00:00 No Statement ID; In object type: PROCEDURE 131
02-FEB-2007 00:00:00 PL/SQL 301
02-FEB-2007 00:00:00 Query 181
02-FEB-2007 00:00:00 1073
03-FEB-2007 00:00:00 DDL 20
03-FEB-2007 00:00:00 DML 26
03-FEB-2007 00:00:00 No Statement ID; In object type: Not Specified 91
03-FEB-2007 00:00:00 No Statement ID; In object type: PACKAGE 2
03-FEB-2007 00:00:00 No Statement ID; In object type: PROCEDURE 84
03-FEB-2007 00:00:00 PL/SQL 166
03-FEB-2007 00:00:00 Query 12
03-FEB-2007 00:00:00 401
04-FEB-2007 00:00:00 DDL 127
04-FEB-2007 00:00:00 DML 14
04-FEB-2007 00:00:00 No Statement ID; In object type: Not Specified 410
04-FEB-2007 00:00:00 No Statement ID; In object type: PROCEDURE 305
04-FEB-2007 00:00:00 PL/SQL 306
04-FEB-2007 00:00:00 Query 14
04-FEB-2007 00:00:00 1176
05-FEB-2007 00:00:00 DDL 115
05-FEB-2007 00:00:00 DML 81
05-FEB-2007 00:00:00 Explain Plan 1
05-FEB-2007 00:00:00 No Statement ID; In object type: Not Specified 261
05-FEB-2007 00:00:00 No Statement ID; In object type: PACKAGE 16
05-FEB-2007 00:00:00 No Statement ID; In object type: PROCEDURE 161
05-FEB-2007 00:00:00 PL/SQL 315
05-FEB-2007 00:00:00 Query 360
05-FEB-2007 00:00:00 1310
06-FEB-2007 00:00:00 DDL 98
06-FEB-2007 00:00:00 DML 86
06-FEB-2007 00:00:00 Explain Plan 2
06-FEB-2007 00:00:00 No Statement ID; In object type: Not Specified 212
06-FEB-2007 00:00:00 No Statement ID; In object type: PACKAGE 16
06-FEB-2007 00:00:00 No Statement ID; In object type: PROCEDURE 108
06-FEB-2007 00:00:00 PL/SQL 299
06-FEB-2007 00:00:00 Query 439
06-FEB-2007 00:00:00 1260
07-FEB-2007 00:00:00 DDL 98
07-FEB-2007 00:00:00 DML 162
07-FEB-2007 00:00:00 Explain Plan 1
07-FEB-2007 00:00:00 No Statement ID; In object type: Not Specified 210
07-FEB-2007 00:00:00 No Statement ID; In object type: PACKAGE 24
07-FEB-2007 00:00:00 No Statement ID; In object type: PROCEDURE 96
07-FEB-2007 00:00:00 PL/SQL 337
07-FEB-2007 00:00:00 Query 348
07-FEB-2007 00:00:00 1276
08-FEB-2007 00:00:00 DDL 112
08-FEB-2007 00:00:00 DML 420
08-FEB-2007 00:00:00 Explain Plan 1
08-FEB-2007 00:00:00 No Statement ID; In object type: Not Specified 311
08-FEB-2007 00:00:00 No Statement ID; In object type: PACKAGE 25
08-FEB-2007 00:00:00 No Statement ID; In object type: PROCEDURE 119
08-FEB-2007 00:00:00 PL/SQL 572
08-FEB-2007 00:00:00 Query 493
08-FEB-2007 00:00:00 2053
DDL 682
DML 838
Explain Plan 5
No Statement ID; In object type: Not Specified 1787
No Statement ID; In object type: PACKAGE 90
No Statement ID; In object type: PROCEDURE 1004
PL/SQL 2296
Query 1847
8549
68 rows selected.
Elapsed: 00:09:49.44
It was just an idea...if anyone has anything to add/improve it feel free to comment.
Wednesday, September 20, 2006
Why you should skim read deep stuff first!
Doug said to me recently that it was one of those books you probably should skim read first in order to get your brain used to what content is in it...so when you need to know about a particular topic you'll (hopefully) remember that it was covered somewhere in such and such book...I think he's probably spot on there.
Friday, September 15, 2006
Help the Cost Based Optimizer - add constraints - but beware the NULLS
Not quite.
You can of course use a Function Based index...but that's not the subject of this post...so what else can we use in some circumstances ?
Well, I attended the Scottish Oracle User Group conference in Glasgow on Monday and enjoyed the Masterclass Jonathan Lewis gave on the CBO. After recently reading his book, the course had a degree of familiarity in terms of the slide content, but it was still a very worthwhile experience as Jonathan is a good presenter and I find it sinks in perhaps easier than just reading the book.
One of the things Jonathan said was that if you had a predicate such as this:
WHERE UPPER(col1) = 'ABC'
...then the CBO can choose to ignore the presence of the UPPER() function if there happens to be a constraint defined on that column that can effectively substitute for that function.
I'd never heard of this so I decided to investigate...
First I created a table:
create table t1(id number
,v1 varchar2(40) null
,v2 varchar2(40) not null
,constraint t1_ck_v1 check(v1=UPPER(v1))
,constraint t1_ck_v2 check(v2=UPPER(v2))
);
Note the presence of two character columns - one NULLable and the other mandatory. I've added check constraints enforcing the uppercase content of both these character columns also.
...next I create indexes on these character columns:
create index t1_i1 on t1(v1);
create index t1_i2 on t1(v2);
...insert some data and analyse the table:
insert into t1(id,v1,v2)
select l
, 'THIS IS ROW: 'TO_CHAR(l)
, 'THIS IS ROW: 'TO_CHAR(l)
from (select level l from dual connect by level<500001);
commit;
exec DBMS_STATS.GATHER_TABLE_STATS ownname=>USER,tabname=>'T1',estimate_percent=>100,cascade=>TRUE);
(NOTE - The data in columns V1 and V2 is an actual value in each row, i.e. there are no NULLs. This will be important later).
...now lets turn autotrace on:
set autotrace on
...and try a query against the table using the optional column:
select * from t1
where upper(v1)='THIS IS ROW: 1';
...which gives us (abridged for clarity/succinctness):
ID V1 V2
-- --------------- ---------------
1 THIS IS ROW: 1 THIS IS ROW: 1
1 row selected.
Elapsed: 00:00:00.81
Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013
--------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
--------------------------------------------------------------------------
0 SELECT STATEMENT 5000 214K 789 (5) 00:00:10
* 1 TABLE ACCESS FULL T1 5000 214K 789 (5) 00:00:10
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(UPPER("V1")='THIS IS ROW: 1')
As we can see, it decided that with the UPPER() function involved, a plan using the index was not possible and so chose to do a full table scan - which was not what I was expecting.
I must admit I looked at it for some time to try and understand why it wasn't doing what Jonathan had indicated it would. I then called in my colleague Anthony, to discuss it and, after much thought, he came up with the answer that it was the definition of the V1 column being NULLable that was causing the CBO to not be able to use the index since NULLS are not stored in (B Tree) indexes and therefore, given the information at it's disposal, the CBO deemed it impossible for the query to be answered via the index since it could, potentially, have missed a NULL value.
Given this information, I then rebuilt my test table to include the V2 column as per the above definition and then ran the query against the V2 column which was declared as NOT NULL:
select * from t1
where upper(v2)='THIS IS ROW: 1';
gives us:
ID V1 V2
-- --------------- ---------------
1 THIS IS ROW: 1 THIS IS ROW: 1
1 row selected.
Elapsed: 00:00:00.03
Execution Plan
----------------------------------------------------------
Plan hash value: 965905564
-------------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
-------------------------------------------------------------------------------------
0 SELECT STATEMENT 1 44 4 (0) 00:00:01
1 TABLE ACCESS BY INDEX ROWID T1 1 44 4 (0) 00:00:01
* 2 INDEX RANGE SCAN T1_I2 1 3 (0) 00:00:01
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("V2"='THIS IS ROW: 1')
filter(UPPER("V2")='THIS IS ROW: 1')
So, for the mandatory column, the CBO determines that the index can be used as an access path to obtain all of the relevant rows and given that it's more efficient to do so it uses the index T1_I2 accordingly. This is what I was expecting to see in the first place...but obviously the NULLability of the V1 column had led me astray.
So, what happens if we add another predicate to the first query to try and inform the CBO that we are not looking for any NULL values - will it be clever enough to add this fact to the information from the constraint and come up with an index access path ?
select * from t1
where upper(v1)='THIS IS ROW: 1'
and v1 is not null;
which gives us:
ID V1 V2
-- --------------- ---------------
1 THIS IS ROW: 1 THIS IS ROW: 1
1 row selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 1429545322
-------------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
-------------------------------------------------------------------------------------
0 SELECT STATEMENT 1 44 4 (0) 00:00:01
1 TABLE ACCESS BY INDEX ROWID T1 1 44 4 (0) 00:00:01
* 2 INDEX RANGE SCAN T1_I1 1 3 (0) 00:00:01
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("V1"='THIS IS ROW: 1')
filter(UPPER("V1")='THIS IS ROW: 1' AND "V1" IS NOT NULL)
So, yes, it can derive from the additional predicate stating that we are only looking for rows where V1 IS NOT NULL, in conjunction with the check constraint T1_CK_V1, that the UPPER() function can be ignored and that the index access path is now available and given it's more efficient, it chooses to use it.
Quite clever really but I'm glad Anthony was around to help me see the wood for the trees on this one.
I spoke with Jonathan about this testing and he said he was aware of the need for the NOT NULL constraint in order for this to work and that from memory he thinks it was somewhere in the middle of 9.2 that this requirement came in to address a bug in transformation.
Wednesday, August 16, 2006
10gR2: OBJECT_NAME missing from V$SQL_PLAN for INSERT statement
I posted the query on this entry here but have updated it since and it's on this link now...in case I change it again. It needs more work as it currently doesn't fully understand how many PX slaves will be used for a given piece of SQL, e.g. Sorts and Set operations (UNION etc...) all add to the requirements...I'm trying to determine the rules for this so I can improve the query in this respect.
Anyway, it seemed to work reasonably well - particularly at finding things that were supposed to run parallel but actually ran serially.
I did, however, notice that even on certain (INSERT) statements that didn't involve Sorts/Set operations it was showing records where the required DOP (what I thought would be the maximum DOP it was going to request) was actually less than the achieved DOP - this didn't make sense until I delved deeper and found that the OBJECT_NAME column on the V$SQL_PLAN seems to be NULL on the line relating to the INSERT operation in the plan - running an EXPLAIN PLAN for the statement showed the OBJECT_NAME correctly but only when it's executed and the V$SQL_PLAN is checked did I find that this column appears to be, unexpectedly, NULL.
With the OBJECT_NAME being NULL it meant that my script was not including the DOP on the target table when determining the object with the highest DOP thereby sometimes getting an incorrect value and resulting in rows where the Required DOP is less than the Achieved DOP.
I created the obligatory test case to illustrate the point before contacting support to raise an SR (5686332.993). It turns out that this is all down to bug 5204810 which is (allegedly) fixed in 11g. The bug is a little unclear in that it only talks about this issue when dealing with conventional inserts and not Direct Path inserts which were those where I'd experienced the problem.
The bug suggests that 11g will add a new line to the execution plans such that it will look like this:
-----------------------------------------
Id Operation Name
-----------------------------------------
0 INSERT STATEMENT
1 LOAD TABLE CONVENTIONAL TABLE_X
-----------------------------------------
The ID 1 line will be visible in 11g the bug suggests and it will then start to show the OBJECT_NAME (TABLE_X in this instance) correctly on this new line rather than a NULL.
Oracle support suggested that although it wasn't directly mentioned in this bug, it was still the same issue causing the problem for my Direct Path inserts...although I'm not entirely convinced since Direct Path inserts already have an extra line (LOAD AS SELECT) under the INSERT STATEMENT line in their plans...perhaps the fix, whilst including the line for LOAD TABLE CONVENTIONAL for conventional inserts, has also ensured that the OBJECT_NAME is correctly displayed when it's a Direct Path INSERT and the Operation is LOAD AS SELECT...I'll try to remember to test it when 11g becomes available.
Tuesday, July 25, 2006
Scalar Subqueries and their effect on execution plans
We had a situation where a piece of DML was running slowly and I was asked to take a look at it to see what it was doing. I was told that the query had undergone some modifications recently (can you hear the alarm bells ringing !?) but that the execution plan had been checked and it was fine.
I first ran a little script called get_query_progress.sql which I wrote to get the execution plan from the V$SQL_PLAN table and link it to the work areas and long ops V$ views to give a better picture of where a query is at in it’s execution plan…hopefully to give an indication of how far through it is. The results of this showed that the plan was as I had expected for this piece of DML – it’s a big operation using parallel query, full table scans and hash joins to process lots of rows in the most efficient way…so why was it going slowly if the execution plan looked good ? Well, the progress script I ran wasn’t really much use since it just showed the execution plan and didn’t have any long ops to speak of so I couldn’t tell where exactly in the plan it had reached…time for another tack.
I then ran a query against V$SESSION_WAIT_HISTORY to see if that would show what we were waiting for…expecting to see scattered reads and perhaps read/writes on temp as it does the hash joins I was surprised to see lots of sequential reads which suggested the use of an index – but the execution plan from above did not involve any indexes so this seemed odd.
Next I figured I’d try to determine what the sequential reads were actually reading by using the File# and Block# from the waits information using this query:
SELECT segment_name, segment_type, owner, tablespace_name
FROM dba_extents
WHERE file_id = <file#>
AND >block#> BETWEEN block_id AND (block_id + blocks -1)
/
Which showed that the sequential reads were in fact reading an index on one of the tables – but how could this be when the plan showed no signs of this step ?
I took the DML and did an EXPLAIN PLAN on it to see what this thought the intended plan would be and it showed no signs of this index read so that’s EXPLAIN PLAN and V$SQL_PLAN (i.e. the actual plan it’s using) both showing no sign of this index read and yet it was definitely happening.
Looking at the changes to the DML my colleague Anthony and I noted that the changes made recently had involved adding some new code including some scalar subqueries on some additional tables – tables which were not showing in the intended/actual plans either. We decided to rewrite the query using an outer join approach rather than the scalar subquery and the plans then showed up the access of these new tables and that the access was via a full table scan. Executing the DML then resulted in an elapsed time in line with expectations.
I did set up a simple test case and then ran a 10053 trace on it to see what was in there -
DROP TABLE test1
/
CREATE TABLE test1(col1 number
,col2 number)
/
DROP TABLE test2
/
CREATE TABLE test2(col1 number
,col2 number)
/
BEGIN
FOR r IN 1..100000 LOOP
INSERT INTO test1 VALUES(r,100000+r);
IF MOD(r,2)=0 THEN
INSERT INTO test2 VALUES(100000+r,200000+r);
END IF;
END LOOP;
COMMIT;
END;
/
exec dbms_stats.gather_table_stats(ownname => USER,tabname => 'TEST1',estimate_percent => 100);
exec dbms_stats.gather_table_stats(ownname => USER,tabname => 'TEST2',estimate_percent => 100);
SELECT /*+ OUTER JOIN APPROACH */
COUNT(1) FROM (
SELECT t1.col1
, t1.col2
, t2.col2
FROM test1 t1
, test2 t2
WHERE t1.col2 = t2.col1(+)
ORDER BY t1.col1
)
/
alter session set events '10053 trace name context forever';
SELECT /*+ SCALAR SUBQUERY APPROACH */
COUNT(1) FROM (
SELECT t1.col1
, t1.col2
, (SELECT t2.col2
FROM test2 t2
WHERE t1.col2 = t2.col1
) col2
FROM test1 t1
ORDER BY t1.col1
)
/
In the trace file we first see the scalar subquery being evaluated…
****************
QUERY BLOCK TEXT
****************
SELECT t2.col2
FROM test2 t2
WHERE t1.col2 = t2.col1
*********************
QUERY BLOCK SIGNATURE
*********************
qb name was generated
signature (optimizer): qb_name=SEL$3 nbfros=1 flg=0
fro(0): flg=0 objn=55825 hint_alias="T2"@"SEL$3"
… then the costs for this subquery:
*********************************
Number of join permutations tried: 1
*********************************
Final - First Rows Plan: Best join order: 1
Cost: 27.6270 Degree: 1 Card: 1.0000 Bytes: 9
Resc: 27.6270 Resc_io: 26.0000 Resc_cpu: 10783378
Resp: 27.6270 Resp_io: 26.0000 Resc_cpu: 10783378
…then it works on the whole query:
****************
QUERY BLOCK TEXT
****************
SELECT /*+ SCALAR SUBQUERY APPROACH */
COUNT(1) FROM (
SELECT t1.col1
, t1.col2
, (SELECT t2.col2
FROM test2 t2
WHERE t1.col2 = t2.col1
) col2
FROM test1 t1
ORDER BY t1.col1
)
*********************
QUERY BLOCK SIGNATURE
*********************
qb name was generated
signature (optimizer): qb_name=SEL$51F12574 nbfros=1 flg=0
fro(0): flg=0 objn=55824 hint_alias="T1"@"SEL$2"
…the costs for which are:
*********************************
Number of join permutations tried: 1
*********************************
Final - All Rows Plan: Best join order: 1
Cost: 57.8271 Degree: 1 Card: 100000.0000 Bytes: 500000
Resc: 57.8271 Resc_io: 55.0000 Resc_cpu: 18737631
Resp: 57.8271 Resp_io: 55.0000 Resc_cpu: 18737631
And the (incomplete) plan it thinks it’s running:
============
Plan Table
============
--------------------------------------+-----------------------------------+
Id Operation Name Rows Bytes Cost Time
--------------------------------------+-----------------------------------+
0 SELECT STATEMENT 58
1 SORT AGGREGATE 1 5
2 TABLE ACCESS FULL TEST1 98K 488K 58 00:00:01
--------------------------------------+-----------------------------------+
So, I guess you get more information in the 10053 trace file including the breakdown for the subquery(ies) - so in the case I was working on that would have highlighted to me the use of this new table and it's access path being (inefficiently) via an index. It still gets the overall plan wrong though.
After a bit of reading, Anthony discovered that Tom had mentioned this issue of execution plans not reflecting actual processing, in his Effective Oracle by Design book (p504 – 514). Tom remarked that it was an issue in 9iR2 and that he hoped it would be fixed in a future release – well, that release is not 10gR2 since we’re still seeing this anomaly.
Labels: tuning
Thursday, July 20, 2006
Parallel downgrades to serial after an OWB 10.1.0.4 upgrade
The problem we found was that we were getting lots of "Parallel operations downgraded to serial" - which, given that we're not using adaptive multi user whilst running our batch suite is indicative of mappings starting up and finding that there are no parallel slaves left whatsoever to play with. This was further confirmed by the fact that there were not many occurrences, in comparison, of downgrades by 25/50/75%...
select name
, value
from v$sysstat
where name like 'Parallel%'
/
Name Value
------------------------------------------- -------
Parallel operations not downgraded 18,694
Parallel operations downgraded to serial 476,364
Parallel operations downgraded 75 to 99 pct 15
Parallel operations downgraded 50 to 75 pct 137
Parallel operations downgraded 25 to 50 pct 214
Parallel operations downgraded 1 to 25 pct 85
So why would that be then when the batch suite has been working fine (i.e. getting finished in good time) for quite some time now ?
The obvious thing to ask would be "what has changed ?" but we thought we'd try and look at it from an analysis of database metrics - we've got Enterprise Manager and AWR and ASH stuff to look at...so we tried to find where it tells us whether a given job (i.e. a piece of DML SQL in an ETL mapping) ran Parallel or not ? and more importantly with what degree of parallel ? Now, we may be novice EM users but basically we couldn't find what we wanted which was essentially the contents of V$PX_SESSION at the point in time that the query ran so we could determine the DEGREE (actual achieved) and the REQ_DEGREE (what DOP was asked for) and see if there is any difference or if there is no record at all in the V$PX_SESSION thereby indicating that the query ran serially.
I'm sure there is probably a way of getting this information from EM/AWR/ASH but I can't find it so after a fruitless search I decided to build a simple capture process which recorded the contents of V$PX_SESSION in a history table and we scheduled an EM job to run it periodically (repeated every minute in our case).
After we'd recorded this information we were able to analyse it by creating a simple view which linked this history data (in MGMT_T_V$PX_SESSION_HISTORY) to some of the ASH views...
CREATE OR REPLACE VIEW mgmt_v_sql_dop_analysis AS
WITH sql_dop AS
(
SELECT /*+ NO_MERGE MATERIALIZE */
DISTINCT hsp.sql_id
FROM dba_hist_sql_plan hsp
, dba_tables t
WHERE hsp.object_owner = t.owner
AND hsp.object_name = t.table_name
AND hsp.object_owner IS NOT NULL
AND t.degree IS NOT NULL
AND LTRIM(RTRIM(t.degree)) != '1'
)
SELECT /*+ ORDERED USE_HASH(dhs,sd,ps) */
hsh.sql_id
, hsh.sample_time
, TRUNC(hsh.sample_time) ash_sample_date
, dhs.sql_text
, (CASE WHEN sd.sql_id IS NULL
THEN 'SERIAL JOB'
ELSE 'PARALLEL JOB'
END) job_type
, (CASE WHEN ps.saddr IS NULL
THEN 'NOPARALLEL'
ELSE 'PARALLEL'
END) ran_parallel_flag
, ps.rundate,ps.saddr,ps.sid
, ps.serial#
, ps.qcsid
, ps.qcserial#
, ps.degree
, ps.req_degree
FROM dba_hist_active_sess_history hsh
, dba_hist_sqltext dhs
, sql_dop sd
, mgmt_t_v$px_session_history ps
WHERE ps.qcsid(+) = hsh.session_id
AND ps.qcserial#(+) = hsh.session_serial#
AND hsh.sql_id = dhs.sql_id
AND hsh.sql_id = sd.sql_id(+)
-- we started collecting mgmt_t_v$px_session_history
-- at this point...
AND hsh.sample_time >=
TO_DATE('15-JUL-2006','DD-MON-YYYY')
/
...and then run this query:
SELECT a.sql_id
, a.sample_time
, a.ash_sample_date
, a.sql_text
, a.rundate
, a.job_type
, a.ran_parallel_flag
, a.rundate
, a.saddr
, a.sid
, a.serial#
, a.qcsid
, a.qcserial#
, a.degree
, a.req_degree
FROM mgmt_v_sql_dop_analysis a
where ash_sample_date = '21-JUL-2007'
and job_type='PARALLEL JOB'
and (ran_parallel_flag = 'NOPARALLEL'
or req_degree != degree)
/
which finds jobs which involve tables with a DOP set on them and therefore should be using parallel but which actually ran either without any parallel slaves [i.e. Downgraded to Serial]...and hence were probably really slow... or ran with less than the requested parallel slaves [i.e. Downgraded by some percentage (25/50/75)]...and hence were probably slower than we'd have liked.
This highlighted lots of jobs which were not getting any DOP (as per the high figures of "Parallel operations downgraded to serial") and a small number of jobs which were being downgraded somewhat - the same information as from V$SYSSTAT but more detailed in terms of which jobs had been affected and how.
So, we've identified which jobs were getting downgraded and it was pretty much in line with the elapsed times of those jobs - the more the downgrade the slower they went. Problem is that doesn't specifically identify why except we noticed that it looked like lots of the mappings were running with far higher DOP than we'd expected (24 v 8) and hence probably why we were exhausting the supply of slaves.
We couldn't work out why some of these mappings were getting DOP of 24 instead of the 8 we were expecting and had set on the tables. It pointed to the fact that the table level DOP was being ignored or overridden which in turn pointed the finger of suspicion on the mapping code.
So, back to the original question of "What's changed recently ?"...
Well, we'd undergone an upgrade of OWB from 10.1.0.2 to 10.1.0.4 very recently and to cut a long story short, it seems that the problem has occurred as a result of that. The reason being that 10.1.0.2 OWB corrupted the PARALLEL hints on the mapping and the DML SQL then resorted to using the table level degree which was 8. Unfortunately, the 10.1.0.4 version has fixed this hint corruption issue and some new mappings recently created in this version are now using the hints the mapping specifies which are of the form:
PARALLEL(table_alias DEFAULT)
...which on our box with CPU_COUNT = 24 and PARALLEL_THREADS_PER_CPU = 1 means those mappings now get DOP of 24 instead of the table level DOP 8.
The issue didn't affect the mappings which were originally created in 10.1.0.2 as they had the corrupted hints still in place but new mappings created in OWB 10.1.0.4 suffered from the problem.
What happens then is that those new mappings when they run in the schedule, ask for and get, far more of the available pool of px slaves than we anticipated and thus the pool of slaves is depleted and later running mappings have no slaves to use thereby running serially.
The solution was to go through all the mappings and remove the PARALLEL hints - instead we'll be working off table level parallel degree settings exclusively and our batch suite should settle down again.
Monday, March 20, 2006
Compression and ITL
To help me with the compression internals, Jonathan Lewis advised me a while ago to look at the website of Julian Dyke to go over his block dumping stuff which has proved very interesting and useful. I also found a presentation Julian did on compression based on Oracle 9iR2 which had some stuff I’d covered in my last presentation as well as plenty more detailed stuff as you’d expect from Julian.
Julian has a great picture of the structure and breakdown of the inside of a compressed block in his presentation which I’ve been trying to explore in more detail by testing with different block sizes and data. One of the things that has come to light is that there are quite a few factors involved in determining the compression that will be realized when using data segment compression. My original thinking was that the following factors would somehow play a part:
- Block size
- Number of rows
- Length of data values
- Number of repeats
- Ordering of data being pushed into the target data segment
But after looking at the picture of the block structure in Julians presentation it appears that the following could also play a part since they affect the amount of overhead in each block – which in turn affects the space left for data:
- ITL (Interested Transaction Lists – set by value of INITRANS on table create)
- Number of columns in the data segment
I’m currently developing some test scripts to go with the presentation which will show how each of these factors affects the level of compression achieved – might make their way into the presentation in the form of graphs just to illustrate the point.
I did have a TAR (sorry SR!) open with Oracle to see if they’d give me more details on the actual algorithm that is used during compression but after much delay and deliberation they (development) decided it was something they didn’t want to divulge.
Funniest thing I’ve found so far is that Julian shows the compressed block header for a block in a 9.2.0.1 database clearly showing “9ir2” literals – you’d think they’d change when you move to “10gr2” wouldn’t you ? Think again – it still shows “9ir2” in the 10gr2 block dump trace files!
Labels: tuning
Wednesday, March 01, 2006
DBMS_XPLAN restriction on OPERATION column length
There are some horrendous pieces of SQL in one of the databases I support and getting an execution plan via DBMS_XPLAN is often one of the things I’ll do in any tuning exercise. Unfortunately, there are some pieces of SQL which are so big and more importantly, have so many nested operations that the indentation of the OPERATION column means that the name is either cut off or obscured.
(According to my colleague Alun Fair, the limit for the OPERATION column in the DBMS_XPLAN output seems to be 60 characters…I’ve not checked it but it looks about right to me)
I had a quick nose about in the scripts in $ORACLE_HOME/rdbms/admin to try and work out how it worked and whether there was anything we could do about this and discovered that the entries for the DBMS_XPLAN plan are stored in a TABLE of TYPEs:
CREATE OR REPLACE
type dbms_xplan_type_table
as table of dbms_xplan_type
dbms_xplan_type is:
CREATE OR REPLACE
type dbms_xplan_type
as object (plan_table_output varchar2(200))
In 10gR2 there are some notes in a file called c1001000.sql (part of the 10gR1 to R2 upgrade scripts) which indicate that the length of the PLAN_TABLE_OUTPUT attribute in this DBMS_XPLAN_TYPE type get increased from 200 to 300 characters so I wondered if it would be as simple as implementing that modification from the 10gR2 upgrade script and the code would be clever enough to use more space for the OPERATION column…sadly it didn’t work. I could manage to change the length to 300 easily enough and recompiled the invalid objects without too much pain but it didn’t help the output on a problem piece of SQL, so I guess I’ll have to raise a Service Request with Oracle to see whether they can enhance this aspect of DBMS_XPLAN in a future release.
Addendum:
Alun Fair pointed me in the direction of the 10gR2 docs on DBMS_XPLAN which show that there are some new formatting options to the DISPLAY call of DBMS_XPLAN - sadly none of them force the package to use more width for the OPERATION column even when it's available.
Labels: tuning
Tuesday, February 14, 2006
Five Tuning Tips For Your Data Warehouse - A Presentation
If you don't know Mark Rittman the "humour" at the beginning might go over your head - I guess you just had to be there!
Comments/discussion welcome.
Saturday, February 11, 2006
Slow access to V$DATAFILE
I posted a while back on the benefits of having a higher granularity of partitions to tablespaces to datafiles…but this issue seems to fall under that of “Cons”.
The problem appears when doing a query like:
select * from v$datafile where file#=29
/
Which then, sometimes, gives the error:
ERROR:
ORA-00235: controlfile fixed table inconsistent due to concurrent update
I guess that's because with having so many files involved in the database the query takes quite a while to run and that during the process, the controlfile changes and therefore the query doesn't believe it will get a consistent set of results...so it bombs out with the error.
So, why does it take so long in the first place ? Even if we have 10,000 files that's still not an enormous number and surely shouldn't take 10's of seconds to query.
If we use explain plan we can see the expected execution plan looks like this:
--------------------------------------------------------
Id Operation Name
--------------------------------------------------------
0 SELECT STATEMENT
1 CONCATENATION
2 MERGE JOIN
3 MERGE JOIN
4 SORT JOIN
5 MERGE JOIN
6 SORT JOIN
7 FIXED TABLE FULL X$KCVFH
* 8 SORT JOIN
* 9 FIXED TABLE FULL X$KCCFN
* 10 SORT JOIN
* 11 FIXED TABLE FIXED INDEX X$KCCFE (ind:1)
* 12 SORT JOIN
* 13 FIXED TABLE FULL X$KCCFN
14 MERGE JOIN
15 SORT JOIN
16 MERGE JOIN
17 SORT JOIN
18 MERGE JOIN
19 SORT JOIN
20 FIXED TABLE FULL X$KCVFH
* 21 SORT JOIN
* 22 FIXED TABLE FULL X$KCCFN
* 23 SORT JOIN
* 24 FIXED TABLE FIXED INDEX X$KCCFE (ind:1)
* 25 FILTER
* 26 SORT JOIN
27 FIXED TABLE FULL X$KCCFN
--------------------------------------------------------
We can see that this plan involves numerous sort / merge joins but we can’t see anything indicating the Rows retrieved at each operation in the execution plan, nor the Bytes, Cost or Time values for the operations.
I’m a little confused at not being able to see Rows/Bytes/Cost/Time in the plan especially if I do another query against V$SESSION I can see such information:
---------------------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
---------------------------------------------------------------------------------------------
0 SELECT STATEMENT 1 1065 0 (0) 00:00:01
1 NESTED LOOPS 1 1065 0 (0) 00:00:01
2 FIXED TABLE FULL X$KSUSE 1 856 0 (0) 00:00:01
3 FIXED TABLE FIXED INDEX X$KSLED (ind:2) 1 209 0 (0) 00:00:01
I wondered whether it was because the stats hadn’t been gathered for some of the fixed objects (X$ tables) so after reading this (metalink login required) I issued the command :
DBMS_STATS.GATHER_FIXED_OBJECTS_STATS
…expecting to see the stats appear – they didn’t…so I’m still confused as to why I don’t see those important stats in some circumstances.
Anyway, looking at my sample query above we can all see the wonderful example of “lazyitis” in my SQL can’t we ? – the use of SELECT * instead of the actual columns I want. Trying the same again but only selecting FILE# and NAME columns gives a much quicker execution.
If we examine the trace files to look at the waits involved we see that the slow running “SELECT *” execution incurs many “db file sequential read” waits whilst the “SELECT file#,name” execution incurs none. On my 10gr2 on XP environment there were 17165 occurrences in the trace file with approximately 18K wait events in total – so a very large proportion of the excess time is down to this db file sequential reading…i.e. the sorts are much more costly when we select, unnecessarily, the whole row instead of just the columns we require.
Why was I trying to use V$DATAFILE ? Well, I was reading the Oak Table Insights book the other night and the chapter by Gaja Krishna Vaidyanatha on Compulsive Tuning Disorder which suggested the use of V$SESSION_WAIT_HISTORY to track the wait history by session when diagnosing performance problems. From that I took the query Gaja used and extended it a little to try and pinpoint the data or temp file which related to the wait event – where that wait event was for a block in a file – just to give a more complete picture.
The final query is here if you’re interested – it works for serial or parallel queries. As ever, suggestions/corrections are welcome.
Now that I look at it, there is another interesting aspect to the query. When I tried to join in to V$TEMPFILE to pick up the name for the TEMP file being waited on it wasn't coming back with anything. Further investigation revealed that the file# on the wait event was some massive number over 32K in size when we only have 50 temp files. I raised it with support who eventually came back with the knowledge that it's not an absolute file number but a relative one which you need to subtract the value of the initialisation parameter "db_files" from. Once I factored that into the SQL it started to work fine.
I was suffering from “lazyitis” as above which was causing my query to take ages to return – changing it to only use the FILE# and NAME columns makes sure the query returns in around 30s even on a warehouse with over 10,000 files.
On another note, my colleague Anthony pointed this out to me the other day…made me laugh so thought I’d share it with you.
Labels: tuning
Tuesday, December 20, 2005
V$SQL_PLAN bug in 10gR1
(10.1.0.4.0) that I thought I'd share with you…
Essentially, I was trying to go into V$SQL_PLAN for the problem query, joining up the PARENT_ID and ID columns using CONNECT BY when I discovered that it wasn't joining things up correctly. Further investigation showed that some of the entries in the table had PARENT_ID values which did not relate to any ID values and could therefore not be CONNECTed BY any means...I've since raised a TAR - sorry, Service Request (SR) on the super new whizzy Metalink and the response from Oracle was that it was indeed a bug.
I did some searching on Metalink and the forums and found that there have been problems with V$SQL_PLAN before where the support recommendation was that the problem was fixed in later releases and that a workaround was to just use EXPLAIN PLAN or:
select * from table(dbms_xplan.display_cursor('&sql_id'))
/
For the given SQL_ID - if the plan is still in memory.
It's always wise before going to Support with a problem to come up with a test case in my view...they will end up asking for one anyway and it’s perfectly reasonable to do so. Here's the one that I logged with them:
drop table test1;
drop table test2;
create table test1(col1 number, col2 number,col3 number,col4 varchar2(20)); insert into test1 values(1000,2000,3000,'AAA'); insert into test1 values(1001,2000,3001,'BBB'); insert into test1 values(1002,2001,3000,'CCC'); create table test2(col5 number, col6 varchar2(20)); insert into test2 values(2000,'DDD'); insert into test2 values(2001,'EEE'); commit;
WITH i1 AS
(
select /*+ no_merge materialize */ t1.col1
,t1.col2
,t1.col3
,t1.col4
from test1 t1
, test2 t2
where t1.col2 = t2.col5
and t1.col3 = 3000
)
, i2 AS
(
select /*+ no_merge materialize */ t1.col1
,t1.col2
,t1.col3
,t1.col4
from test1 t1
, test2 t2
where t1.col2 = t2.col5
and t1.col3 = 3001
)
select i1.col1
, i1.col2
, i1.col3
, i1.col4
, i2.col4
from i1
, i2
where i1.col2 = i2.col2
/
select operation,id,parent_id,depth,position from v$sql_plan where sql_id='fj03rbvgpqrp6' order by id /
On 10gR1 on HP-UX I'm getting duff entries in the output:
OPERATION ID PARENT_ID DEPTH POSITION
SELECT STATEMENT 0 0 127
TEMP TABLE TRANSFORMATION 1 0 1 1
HASH JOIN 10 1 2 1
VIEW 11 10 3 1
TABLE ACCESS 12 11 4 1
VIEW 13 10 3 2
TABLE ACCESS 14 13 4 1
LOAD AS SELECT 15 1 2 2
HASH JOIN 16 2 3 1
TABLE ACCESS 17 3 4 1
TABLE ACCESS 18 3 4 2
LOAD AS SELECT 19 1 2 3
HASH JOIN 20 6 3 1
TABLE ACCESS 21 7 4 1
TABLE ACCESS 22 7 4 2
Notice that there are no ID's 2,3,6 or 7 and therefore the PARENT_ID's with those values are plain wrong.
Interestingly, on my 10gR2 on XP at home this worked fine so they probably fixed it in going to 10gR2 me thinks:
OPERATION ID PARENT_ID DEPTH POSITION
SELECT STATEMENT 0 0 14
TEMP TABLE TRANSFORMATION 1 0 1 1
LOAD AS SELECT 2 1 2 1
HASH JOIN 3 2 3 1
TABLE ACCESS 4 3 4 1
TABLE ACCESS 5 3 4 2
LOAD AS SELECT 6 1 2 2
HASH JOIN 7 6 3 1
TABLE ACCESS 8 7 4 1
TABLE ACCESS 9 7 4 2
HASH JOIN 10 1 2 3
VIEW 11 10 3 1
TABLE ACCESS 12 11 4 1
VIEW 13 10 3 2
TABLE ACCESS 14 13 4 1
The PARENT_ID and ID's CONNECT up no problem now.
This is a bit of a pain cos I need it to work really for my "where is my query at" efforts....grrrr! Given it only fails in this way infrequently and for certain types of query (in my example where there are numerous TEMP TABLE TRANSFORMATIONs) I think I’ll just live with it as a feature in my script.
Saturday, December 17, 2005
How do I find out where my query is at ? Part II
I’ve since written a better query which at least gives a better picture of where the query is at – don’t get me wrong, it’s still a long way from perfect, but at least it’s getting closer to where it needs to be…
The script is here.
Basically, I join v$session, v$open_cursor, v$sql_plan, v$sqlarea, v$sql_workarea, v$sql_workarea_active and v$sesion_longops to get a more accurate picture of where things are at.
It’s not perfect since there are still occasions when Oracle is performing a particular step in the execution path and I can’t determine where it is, e.g. the final transaction part when it’s a DML statement…but it’s better than anything I’ve managed before.
I still believe there is no simple, definitive way of determining this information as of 10g R2…please tell me if you’re better informed!
Your comments on improving it would be most welcome…particularly since I’m going to include something on this in the presentation I’m giving in January to the BIRT SIG – yes, I’m quite nervous at giving my first presentation in front of more than 3 people thank you very much!
I’ve managed to secure the assistance of a number of trusted friends and associates to review what I produce for the presentation…my thanks to them in advance.
On a different note, I’ve just started reading the Oak Table Insights book which is thoroughly entertaining not to mention full of insight.
Labels: tuning
Sunday, November 20, 2005
Table Compression - Order For Maximum Compression - Code Utility
Code is available here. Feel free to use and abuse it for your own environment – usual caveats apply, i.e. what works for me may not work for you and you should test before relying on it! This example runs in a user called AE_MGMT who has the appropriate privileges to access the DBA Views in PL/SQL and also the table being assessed…you might need to play with that in your environment a little.
Execution takes 4 parameters:
TABLE OWNER Defaults to current USER
TABLE NAME Must supply this
PARTITION NAME Defaults to NULL
SAMPLE SIZE Defaults to 1,000,000
Example output:
ae_mgmt[147/13]@HOPE> exec mgmt_p_get_max_compress_order('SYSTEM','J1',NULL,100000);
Running for TABLE: J1; SAMPLE_SIZE: 100000
Unique ID: 21112005095941
Creating MASTER Table...
Creating COLUMN Table 1:OWNER...
Creating COLUMN Table 2:OBJECT_NAME...
Creating COLUMN Table 3:SUBOBJECT_NAME...
Creating COLUMN Table 4:OBJECT_ID...
Creating COLUMN Table 5:DATA_OBJECT_ID...
Creating COLUMN Table 6:OBJECT_TYPE...
Creating COLUMN Table 7:CREATED...
Creating COLUMN Table 8:LAST_DDL_TIME...
Creating COLUMN Table 9:TIMESTAMP...
Creating COLUMN Table 10:STATUS...
Creating COLUMN Table 11:TEMPORARY...
Creating COLUMN Table 12:GENERATED...
Creating COLUMN Table 13:SECONDARY...
NAME COLUMN BLOCKS ROWS SPACE_GB
=========================== =============== ====== ===== ========
TEMP_COL_004_21112005095941 OBJECT_ID 322 99999 .0098
TEMP_COL_008_21112005095941 LAST_DDL_TIME 331 99999 .0101
TEMP_COL_007_21112005095941 CREATED 338 99999 .0103
TEMP_COL_009_21112005095941 TIMESTAMP 340 99999 .0104
TEMP_COL_002_21112005095941 OBJECT_NAME 410 99999 .0125
TEMP_COL_005_21112005095941 DATA_OBJECT_ID 427 99999 .013
TEMP_COL_011_21112005095941 TEMPORARY 433 99999 .0132
TEMP_COL_012_21112005095941 GENERATED 434 99999 .0132
TEMP_COL_010_21112005095941 STATUS 434 99999 .0132
TEMP_COL_013_21112005095941 SECONDARY 434 99999 .0132
TEMP_COL_003_21112005095941 SUBOBJECT_NAME 434 99999 .0132
TEMP_COL_001_21112005095941 OWNER 523 99999 .016
TEMP_COL_006_21112005095941 OBJECT_TYPE 532 99999 .0162
PL/SQL procedure successfully completed.
Elapsed: 00:01:17.07
So, in the above example, if I want to order the table for maximum compression I’d use OBJECT_ID to order the data by.
It only works out the compression for individual columns so combinations are not covered – you can test that individually afterwards once you’ve narrowed down the best columns – I might retest the above with ordering by combinations of the first 4 columns given the big jump in blocks from 340 on TIMESTAMP to 410 on OBJECT_NAME – starting with the first 2 columns, then adding column 3 etc.. until adding the columns has no marginal effect….pointless sorting by more and more columns that aren’t adding anything – will just take more time/resources for the sort.
Of course, as was mentioned on David’s post/link, you might want to order things to suit the access path rather than for maximum compression – your mileage may vary. I built it for work on a fact table where I wanted maximum compression and where the access path would generally be either a partition/full scan or one of several equally common access paths.
Labels: compression, tuning
Monday, October 03, 2005
SCD2's and their affect on the CBO
and [analysis_date] BETWEEN from_date and to_date
Now, how many rows will the optimiser think are going to be returned from the table ?
I figured that topic might have already been investigated by somebody so I did a search on google which was interesting:
What a pleasant surprise to find a reference from my own blog (via orafaq) showing as the number 1 hit...and Mark Rittman also shortly after! Unfortunately my own post was on a different matter and Mark's was too so I was still a bit stumped...
After a bit more research I found something from Wolfgang Breitling on this subject which confirmed my thoughts and discussed it very eloquently along with other fallacies of the CBO.
Now - Wolfgang tells us here what the problem is and that there isn't really a remedy other than using hints or stored outlines to guide the CBO...and who am I to argue!
I did think that maybe we could create some interface tables to hold all the possible range to date permutations and then when a user queries for a given analysis date they could use an equality predicate on the interface tables which would convert to pairs of from/to dates which then get equality matched to the target SCD2 - it kinda works but it means a lot of work to crunch through the interface tables just to avoid the problem of the CBO not being able to work out the selectivity/cardinality and potentially making a bad plan choice. The more the possible permutations the more work it becomes and in reality the number of permutations seems to be prohibitive so I've binned that idea for our environment. I might try to catch up with Wolfgang at the UKOUG to discuss this one further if I can grab his attention.
Addendum - One of Wolfgang's suggestions in his paper is to artificially set the stats on the table to some large number so that even when it factors the number down for the probability calculation it does then the number is still large and it will consequently choose hash/merge joins over a nested loop index lookups approach. I tried this by setting the table stats on the tables in my query to be large values using:
exec dbms_stats.set_table_stats(ownname => 'THE_SCHEMA',tabname => 'THE_TABLE',numrows => 3000000000,numblks => 24000000);
This seemed to work but I still wasn't overly keen on it since that means the optimizer is going to be coerced for any access to such tables - even if there isn't a join required.
My colleague Tank then came up with the idea that given most of our processes run off an "analysis date" which we store in a table, we could just create a materialized view of the contents of this table and set its stats to an artificially high value for numrows/blks and then given this table is used as the driver of most queries it would propagate through the plan and the optimiser, even applying heavy reductions for the probability would still realise that there were a lot of rows to process and choose plans accordingly....it worked a treat.
