Monday, October 31, 2005
UKOUG: JonathanLewis - How the CBO works
Excellent speaker...very funny.
Key points:
Optimiser goal - Give the CBO a chance by telling it what kind of environment you have - ALL_ROWS or FIRST_ROWS_n
Gather system stats - Gives CPU, Multi Block Read Count and IO figures to help CBO work out cost of a query - or estimated time for completion.
Use DBMS_STATS to gather stats - not ANALYZE - 10g automatically gathers these stats via a job created when the database is created. (you can still use ANALYZE for validating index structures and listing chained rows - just not stats stuff).
AVG_DATA_BLOCKS_PER_KEY on DBA_INDEXES is a human readable restatement of the CLUSTERING_FACTOR column - CBO uses the latter in calculating cost but easier to read the former. Higher values for both indicate higher cost...and hence why a given index may, correctly, not be chosen by CBO.
CBO counts physical IO's when determining cost - it assumes all IO's are physical which is not always true of course.
There is a scaling of the DB_MULTIBLOCK_READ_COUNT value - when you ask for 8, for example it will use a value of 6.588. This number is fixed - there is a table of fixed values per different value of DMRC.
OICA Threat - don't change OPTIMISER_INDEX_COST_ADJ to low values to persuade CBO to use indexes instead of full table scans - it makes all index access cheaper and may result in choosing a poor/inefficient index.
More interesting stuff in the paper if you're interested.
UKOUG: Tom kyte, SQL Optimisation
Excellent - obviously. Yes Tom, I did find it interesting and I did learn something.
3 key points...
Schema is important - sometimes it's not the SQL that needs tuning - look at the schema design - Tom gave an example of using an IOT to colocate data and improve performance of a specific simple query.
RTFM - features change - advice has a shelflife - Tom used the example of array processing being used automatically and transparently so that rewriting queries prior to 10g such that they use array processing is no longer necessary in 10g onwards.
Scalar subqueries - way cool feature for removing outer joins on fast return queries. Using SS to reduce query effort by PL/SQL function cacheing.
Ok - gotta do 4 - an honorable mention you might say...
Partitioned Outer Joins - nice new 10g feature but not in itself groundbreaking. The example Tom demonstrated was, however, very cool - given to him by Mikito Harakiri - a data generator which uses dual to get however many rows you want with - literally - no IO! Look out for the presentation - I couldn't type quick enough to get the SQL...
Actually there were more highlights but I guess you figured as much!
Not heard Tom present before - thoroughly enjoyable and humorous to boot!
UKOUG: Keynote & Fusion
Fusion seems to be the big thing - the strategy going forward for dealing with the Peoplesoft,
J D Edwards, Retek and Oracle Apps products and their customers.
Protect - each product has loyal and happy customers due to inherent strengths - it would be foolish to jeopardise this.
Extend - enhancements to the individual products and their integration with each other, e.g. Retek to financials bridge.
Evolve - doing things better - building sustainable competitive advantage by blended business insight and process automation throught adaptive processes.
A kind of reassuring session for the various product customers that Larry raided the piggy bank for this year!
Somebody forgot to pay their electricity bill!
Birmingham has been suffering powercuts this morning and the conference centre has been affected so registration is running late and the keynote isn't going to start for at least another 20 minutes!
Perhaps this is an example of the powercuts the utility companies are warning of in the near future in the UK ?
Met fellow blogger, Pete Scott this morning and had a coffee whilst we've been waiting - he only recognized me cos I was the wally wearing the name badge! He's a nice chap.
UKOUG Begins...
...well, I guess it could have started yesterday with the oak table event but I couldn't make it.
Anyway, i've done my usual trick of building in way too much time contingency in my travel schedule so i'm sitting in a pretty empty Birmingham International Convention Centre with time to kill...
My plan as far as blogging goes is to post something after each session via my Orange SPV M1000 PDA and a GPRS connectivity - but we'll have to see if I get time for it. A friend of mine who I met at university was of the opinion that you could potentially boil a presentation down to 3 key points - he's a sharp cookie so I might try and follow that approach in my blog posts over the next few days.
Now, where's the café...I need some caffeine!
Monday, October 24, 2005
ORION Usage And Futures
It seems that it’s not going to be available on the HP-UX which is unfortunate for my current project as that’s what we’re running on. Only Linux and Windows are available – it is beta, unsupported stuff after all though so fair play. I did hear something to the effect that it might be part of the RDBMS as of version 11 though so at that stage I guess it will cover all platforms – something to look forward to – perhaps it will be in Tom’s “Top 10” presentation next year ? ;-)
After finally managing to get my SUSE Linux 10 up and running in Microsoft Virtual PC – albeit with only 512M of RAM to play with - I could try ORION out on there as well as on Windows…some highlights of the process…
Download all the ORION products (Windows and Linux) from here.
Download SUSE 10 Linux and installed it into a Microsoft Virtual PC VM.
Installed ORION under both SUSE linux and Microsoft XP platforms.
Windows XP (service pack 2 and updates)
Ran the orion10.2_windows.msi file to install the Windows version of ORION accepting all the defaults during the installation.
I don’t have raw partitions or logical volumes available on my Windows box so I tested the Windows port using files.
I pretty much followed the instructions in the acrobat document that comes with the product download. The only gotcha I encountered was that in trying to use files I discovered that unless the files were PPT, DOC or XLS I couldn’t get ORION to run – it simply complained and threw up a Microsoft Debug message. As soon as I used one of the file types mentioned it worked fine. I did try renaming a text file to a .PPT but it didn’t work either so it really does need to be a proper MS Office type file it appears – either that’s very odd or I’m being thick and doing something extremely daft!
When I tried to use files on the SUSE Linux install it wouldn’t work with any type of files including these office ones so I’ve no idea how to use files on the Linux port – the raw lvm’s approach worked fine on there though.
Results of the simple run appeared in around 9 minutes showing:
Maximum Large MBPS=24.03 @ Small=0 and Large=2
Maximum Small IOPS=195 @ Small=4 and Large=0
Minimum Small Latency=5.42 @ Small=1 and Large=0
SUSE Linux 10
This was a bit more tricky as trying to run the ORION program immediately brought to my attention the fact that my linux install was missing a libaio library which I then had to research and install.
I downloaded the 0.3.0.version of libaio from technet here and tried to install it by running
./configure
...which then complained that it needed a version of libaio present to upgrade…so I downloaded one from Werner Puschitz here which helped me get libaio installed after a bit of effort.
OK after installing these files I had a working version of libaio and I was able to proceed with the use of ORION.
I set up 2 additional disks for the VM on different physical disks, then set up 2 volume groups (vg1 and vg2) with 1 logical volume per volume group (lv1 and lv2 respectively) so that I could test RAID v CONCAT. YAST was very easy to use in doing this I must say.
I created simple.lun with the following:
/dev/mapper/vg1-lv1
/dev/mapper/vg2-lv2
… and then ran the following command to get a simple test:
./orion10.2_linux –run simple –testname simple –num_disks 2
This gave the following results:
Maximum Large MBPS=8.73 @ Small=0 and Large=4
Maximum Small IOPS=80 @ Small=10 and Large=0
Maximum Small Latentcy=15.22 @ Small=1 and Large=0
I then created a file called dw_bench.lun with the following contents:
/dev/mapper/vg1-lv1
/dev/mapper/vg2-lv2
…and ran for datawarehouse advanced load using CONCAT using the following call:
./orion10.2_linux -run advanced -testname dw_bench -num_disks 2 -matrix point -num_small 0 -num_large 4 -type seq -num_streamIO 2 -simulate concat -cache_size 0 -verbose
…and got these results:
Maximum Large MBPS=9.61 @ Small=0 and Large=4
And then ran for RAID 0 with the following command:
./orion10.2_linux -run advanced -testname dw_bench -num_disks 2 -matrix point -num_small 0 -num_large 4 -type seq -num_streamIO 2 -simulate raid0 -cache_size 0 -verbose
And got these results:
Maximum Large MBPS=8.71 @ Small=0 and Large=4
Now, I tried without multiple sessions or parallel…
CONCAT 1 large, 1 parallel stream IO
Maximum Large MBPS=8.83 @ Small=0 and Large=1
RAID0 1 large, 1 parallel stream IO
Maximum Large MBPS=8.23 @ Small=0 and Large=1
That’s all the results I have for now – at least it’s working after much tinkering.
I was a little perplexed by the fact that the CONCAT performance seemed better than the RAID 0 – pretty similar figures and maybe it’s to do with running on a VM – the difference between the VM figures and the native XP run showed the VM was about 3 times as slow.
One day I might work out why this is so….but that would require me to have two spare minutes to rub together!
Monday, October 17, 2005
ANSI SQL99 Join Standard - yuk!
I don't want to start a SQL standards flamefest but I've recently been involved in a project where a number of the developers had written their code units using the ANSI SQL99 compliant join syntax where you put the joins in the FROM clause and the filter predicates in the WHERE clause, e.g.
REM Using the usual suspects for tables...
CREATE TABLE dept
(deptno NUMBER(2,0) NOT NULL,
dname VARCHAR2(14),
loc VARCHAR2(13));
ALTER TABLE dept
ADD CONSTRAINT pk_dept PRIMARY KEY (deptno)
USING INDEX;
CREATE TABLE emp
(empno NUMBER(4,0) NOT NULL,
ename VARCHAR2(10),
job VARCHAR2(9),
mgr NUMBER(4,0),
hiredate DATE,
sal NUMBER(7,2),
comm NUMBER(7,2),
deptno NUMBER(2,0));
ALTER TABLE emp
ADD CONSTRAINT pk_emp PRIMARY KEY (empno)
USING INDEX;
ALTER TABLE emp
ADD CONSTRAINT fk_deptno FOREIGN KEY (deptno)
REFERENCES dept (deptno);
SELECT dept.dname,emp.ename,emp.sal
FROM dept JOIN emp USING(deptno)
WHERE emp.sal > 1000;
It's been possible to use this syntax since 9i but I've never used it, except where I need to do a FULL OUTER JOIN since you can't do that in the older syntax easily and when you do write it in the older syntax it's slightly worse performance...good coverage of this and more advanced SQL stuff here by Tim Quinlan from his presentation at OUG a few weeks back.
The problem I have is with some aspects of the SQL99 syntax, for example, if I add the join column to the select list like this:
SELECT dept.dname,emp.ename,emp.sal,emp.deptno
FROM dept JOIN emp USING(deptno)
WHERE emp.sal > 1000;
...I get an error:
ORA-25154: column part of USING clause cannot have qualifier
So I have to take off the alias like this:
SELECT dept.dname,emp.ename,emp.sal,deptno
FROM dept JOIN emp USING(deptno)
WHERE emp.sal > 1000;
It's a small point but it's a pain when you're writing a big piece of SQL and you have to remember whether a column you're selecting is one of the join columns or not...so much so that we decided that it would be better if we just used the ON clause like this:
SELECT dept.dname,emp.ename,emp.sal,emp.deptno
FROM dept JOIN emp ON emp.deptno = dept.deptno
WHERE emp.sal > 1000;
...which allows us to use the alias for the join column then - in fact if we didn't we'd then get this error:
ORA-00918: column ambiguously defined
The other issue I have with the USING syntax is that if you've got multiple tables involved in the query then how do you know which tables is the join column on ? e.g.
SELECT dept.dname,ename,emp.sal,deptno,bonus.comm
FROM dept JOIN emp USING(deptno)
LEFT JOIN bonus USING(ename)
WHERE emp.sal > 1000;
Which tables contain the column deptno ? and which have ename ? Making an educated guess and familiarity with these tables in this instance will give you the answers but what if the query contains 10 tables and it's a system you're not familiar with - it starts to get a little difficult to read (and therefore understand/maintain/tune) in my opinion - a point which is supposedly in favour of the SQL99 syntax. I don't think it's easy to read - in fact I think it's easier to read with the older syntax.
Seems to be a very distinct split in the development team here so I guess I'll just have to get more familiar with the new style.
Each to their own I guess.
Thursday, October 13, 2005
ORION - ORacle I/O Numbers calibration tool
The paper was an interesting read particularly when it covered a new tool available from Oracle called ORION - ORacle I/O Numbers calibration tool. Basically a useful (but unsupported) little tool which Oracle is making available to allow benchmarking of I/O subsystems by simulating various I/O loads. More information is available here (after agreeing to the usual disclaimer).
We'd been looking for something like this for a while in our warehouse environment and I thought I'd finally found something special...the unfortunate thing in our case is that it's only available on Windows and Linux ports... I've asked the question as to whether it can be made available on our HP-UX port and will post back when I get a response.
The instructions also only seem to cover linux so I've not been able to test it at home either since I can't get my Suse Linux 10 to install in a Microsoft Virtual PC VM currently as it keeps hanging during the install.
Addendum - Got my Suse Linux 10 installing now - seems that if you have the Microsoft Virtual PC VM set to have use more than 512M RAM it won't boot! Fixed now and it's installing so hopefully I can play with ORION after all!
Tuesday, October 04, 2005
Farewell Guvnor!
I grew up on a diet of Porridge, The Two Ronnies and Open All Hours and can't believe he's gone - he will be sadly missed by all who loved comedy.
Monday, October 03, 2005
SCD2's and their affect on the CBO
and [analysis_date] BETWEEN from_date and to_date
Now, how many rows will the optimiser think are going to be returned from the table ?
I figured that topic might have already been investigated by somebody so I did a search on google which was interesting:
What a pleasant surprise to find a reference from my own blog (via orafaq) showing as the number 1 hit...and Mark Rittman also shortly after! Unfortunately my own post was on a different matter and Mark's was too so I was still a bit stumped...
After a bit more research I found something from Wolfgang Breitling on this subject which confirmed my thoughts and discussed it very eloquently along with other fallacies of the CBO.
Now - Wolfgang tells us here what the problem is and that there isn't really a remedy other than using hints or stored outlines to guide the CBO...and who am I to argue!
I did think that maybe we could create some interface tables to hold all the possible range to date permutations and then when a user queries for a given analysis date they could use an equality predicate on the interface tables which would convert to pairs of from/to dates which then get equality matched to the target SCD2 - it kinda works but it means a lot of work to crunch through the interface tables just to avoid the problem of the CBO not being able to work out the selectivity/cardinality and potentially making a bad plan choice. The more the possible permutations the more work it becomes and in reality the number of permutations seems to be prohibitive so I've binned that idea for our environment. I might try to catch up with Wolfgang at the UKOUG to discuss this one further if I can grab his attention.
Addendum - One of Wolfgang's suggestions in his paper is to artificially set the stats on the table to some large number so that even when it factors the number down for the probability calculation it does then the number is still large and it will consequently choose hash/merge joins over a nested loop index lookups approach. I tried this by setting the table stats on the tables in my query to be large values using:
exec dbms_stats.set_table_stats(ownname => 'THE_SCHEMA',tabname => 'THE_TABLE',numrows => 3000000000,numblks => 24000000);
This seemed to work but I still wasn't overly keen on it since that means the optimizer is going to be coerced for any access to such tables - even if there isn't a join required.
My colleague Tank then came up with the idea that given most of our processes run off an "analysis date" which we store in a table, we could just create a materialized view of the contents of this table and set its stats to an artificially high value for numrows/blks and then given this table is used as the driver of most queries it would propagate through the plan and the optimiser, even applying heavy reductions for the probability would still realise that there were a lot of rows to process and choose plans accordingly....it worked a treat.
Subscribe to Posts [Atom]

