Friday, June 29, 2007

 

Using partitioning to colocate data for optimal multiple index access paths

I came across a situation the other day where a client was accessing rows from a table via one of several different indexes and getting poor performance on all of them.

The table in question had about five million rows and about four indexes. For the process in question, the table was hitting each of the indexes on this single table at one point or another.

I noticed that the clustering factor for each of the indexes was quite poor - a value closer to the number of rows in the table rather than the number of blocks in the table. This was leading to a high number of logical IOs to get the rows needed on each query in the process, because the rows pointed to by the index were all in different data blocks, scattered far and wide. As I recalle, Jonathan gives a great explanation of how the clustering factor can affect the performance of a query in Chapter five of his CBO Fundamentals book and basically this was relevant to this scenario.

A fix for a poor clustering factor, is to sort the data in the table, in the order which matches that of the index in question. So, if INDEX_A is on columns A then B, you should order the data in the table by column A, then B...perhaps by removing it and then reinserting it all using an ORDER BY on the insert.

This of course, only works for one order - you can't order data in multiple different ways within the same table, to suit all the indexes, so for one index you might be able to order the data perfectly but that might be terrible for another index which wants to order by say, column C, then D....or does it?

Well, in this particular case, the data required by each query hitting different indexes on this single table was all of a different type - by this I mean, the table itself was modelled from a Supertype Entity, implemented physically as a single table with a Supertype differentiator column (PRODUCT_TYPE in this case). Query one in the process wanted data for PRODUCT_TYPE "X", and wanted to get this via Index IDX1 on COL1. Query two would then want data of PRODUCT_TYPE "Y" and get it via index IDX2 on COL2 etc...

I figured that if we split the table into partitions based on the PRODUCT_TYPE then we'd be able to order the data within each partition by the appropriate order for the index which would commonly be used to access data of that PRODUCT_TYPE.

It will take them a while to test and implement the solution and gain some empirical results, however a simple test script I knocked up seemed to indicate that the solution could work:

The script...


CLEAR BREAKS
CLEAR COLUMNS
COLUMN index_name HEADING "IndexName" FORMAT A10
COLUMN partition_name HEADING "PartitionName" FORMAT A10
COLUMN leaf_blocks HEADING "LeafBlocks" FORMAT 999
COLUMN distinct_keys HEADING "DistinctKeys" FORMAT 999,999
COLUMN num_rows HEADING "NumRows" FORMAT 999,999
COLUMN clustering_factor HEADING "ClusteringFactor" FORMAT 999,999
COLUMN clfpct HEADING "Clu FactPercent" FORMAT 999

DROP TABLE jeff_unpartitioned PURGE
/
CREATE TABLE jeff_unpartitioned
AS
SELECT l pk_col
, MOD(l,4) partitioning_key_col
, (dbms_random.value * 1000) index_col1
, (dbms_random.value * 1000) index_col2
, (dbms_random.value * 1000) index_col3
, (dbms_random.value * 1000) index_col4
FROM (SELECT level l FROM dual CONNECT BY LEVEL < 100001)
ORDER BY l
/
CREATE UNIQUE INDEX ju_pk ON jeff_unpartitioned(pk_col)
/
CREATE INDEX ju_idx1 ON jeff_unpartitioned(index_col1)
/
CREATE INDEX ju_idx2 ON jeff_unpartitioned(index_col2)
/
CREATE INDEX ju_idx3 ON jeff_unpartitioned(index_col3)
/
CREATE INDEX ju_idx4 ON jeff_unpartitioned(index_col4)
/
exec DBMS_STATS.GATHER_TABLE_STATS (ownname=>USER,tabname=>'JEFF_UNPARTITIONED');

SELECT i.index_name
, i.leaf_blocks
, i.distinct_keys
, i.clustering_factor
, (DECODE((t.num_rows - t.blocks),0,0,(i.clustering_factor) / (t.num_rows - t.blocks))) * 100 clfpct
FROM all_indexes i
, all_tables t
WHERE i.table_name = t.table_name
AND i.table_owner = t.owner
AND t.table_name = 'JEFF_UNPARTITIONED'
AND t.owner = USER
/

DROP TABLE jeff_partitioned PURGE
/
CREATE TABLE jeff_partitioned
(pk_col number
,partitioning_key_col number
,index_col1 number
,index_col2 number
,index_col3 number
,index_col4 number
)
PARTITION BY LIST(partitioning_key_col)
(PARTITION p1 VALUES (1)
,PARTITION p2 VALUES (2)
,PARTITION p3 VALUES (3)
,PARTITION p4 VALUES (4)
)
/
INSERT /*+ APPEND */
INTO jeff_partitioned(pk_col,partitioning_key_col,index_col1,index_col2,index_col3,index_col4)
SELECT l
, 1
, ROUND(l,100)
, (dbms_random.value * 1000)
, (dbms_random.value * 1000)
, (dbms_random.value * 1000)
FROM (SELECT level l FROM dual CONNECT BY LEVEL < 25001)
ORDER BY 3
/
COMMIT
/
INSERT /*+ APPEND */
INTO jeff_partitioned(pk_col,partitioning_key_col,index_col1,index_col2,index_col3,index_col4)
SELECT l
, 2
, (dbms_random.value * 1000)
, ROUND(l,100)
, (dbms_random.value * 1000)
, (dbms_random.value * 1000)
FROM (SELECT level l FROM dual CONNECT BY LEVEL < 25001)
ORDER BY 4
/
COMMIT
/
INSERT /*+ APPEND */
INTO jeff_partitioned(pk_col,partitioning_key_col,index_col1,index_col2,index_col3,index_col4)
SELECT l
, 3
, (dbms_random.value * 1000)
, (dbms_random.value * 1000)
, ROUND(l,100)
, (dbms_random.value * 1000)
FROM (SELECT level l FROM dual CONNECT BY LEVEL < 25001)
ORDER BY 5
/
COMMIT
/
INSERT /*+ APPEND */
INTO jeff_partitioned(pk_col,partitioning_key_col,index_col1,index_col2,index_col3,index_col4)
SELECT l
, 4
, (dbms_random.value * 1000)
, (dbms_random.value * 1000)
, (dbms_random.value * 1000)
, ROUND(l,100)
FROM (SELECT level l FROM dual CONNECT BY LEVEL < 25001)
ORDER BY 6
/
COMMIT
/
CREATE UNIQUE INDEX jp_pk ON jeff_partitioned(pk_col,partitioning_key_col) LOCAL
/
CREATE INDEX jp_idx1 ON jeff_partitioned(index_col1,partitioning_key_col) LOCAL
/
CREATE INDEX jp_idx2 ON jeff_partitioned(index_col2,partitioning_key_col) LOCAL
/
CREATE INDEX jp_idx3 ON jeff_partitioned(index_col3,partitioning_key_col) LOCAL
/
CREATE INDEX jp_idx4 ON jeff_partitioned(index_col4,partitioning_key_col) LOCAL
/
exec DBMS_STATS.GATHER_TABLE_STATS (ownname=>USER,tabname=>'JEFF_PARTITIONED');

SELECT i.index_name
, i.leaf_blocks
, i.distinct_keys
, i.clustering_factor
, (DECODE((t.num_rows - t.blocks),0,0,(i.clustering_factor) / (t.num_rows - t.blocks))) * 100 clfpct
FROM all_indexes i
, all_tables t
WHERE i.table_name = t.table_name
AND i.table_owner = t.owner
AND t.table_name = 'JEFF_PARTITIONED'
AND t.owner = USER
ORDER BY i.index_name
/

SELECT i.index_name
, ip.partition_name
, ip.leaf_blocks
, ip.distinct_keys
, ip.clustering_factor
, (DECODE((t.num_rows - t.blocks),0,0,(ip.clustering_factor) / (t.num_rows - t.blocks))) * 100 clfpct
FROM all_indexes i
, all_ind_partitions ip
, all_tables t
WHERE i.table_name = t.table_name
AND i.table_owner = t.owner
AND i.index_name = ip.index_name
AND i.owner = ip.index_owner
AND t.table_name = 'JEFF_PARTITIONED'
AND t.owner = USER
ORDER BY i.index_name
, ip.partition_name
/


And now the results...


Table dropped.


Table created.


Index created.


Index created.


Index created.


Index created.


Index created.


PL/SQL procedure successfully completed.


Index Leaf Distinct Clustering Clu Fact
Name Blocks Keys Factor Percent
---------- ------ -------- ---------- --------
JU_IDX4 226 100,000 99,830 101
JU_IDX3 226 100,000 99,880 101
JU_IDX2 226 100,000 99,872 101
JU_IDX1 226 100,000 99,833 101
JU_PK 103 100,000 685 1

5 rows selected.


Table dropped.


Table created.


25000 rows created.


Commit complete.


25000 rows created.


Commit complete.


25000 rows created.


Commit complete.


25000 rows created.


Commit complete.


Index created.


Index created.


Index created.


Index created.


Index created.


PL/SQL procedure successfully completed.


Index Leaf Distinct Clustering Clu Fact
Name Blocks Keys Factor Percent
---------- ------ -------- ---------- --------
JP_IDX1 218 100,000 74,593 75
JP_IDX2 218 100,000 74,624 75
JP_IDX3 218 100,000 74,609 75
JP_IDX4 218 100,000 74,627 75
JP_PK 124 100,000 568 1

5 rows selected.


Index Partition Leaf Distinct Clustering Clu Fact
Name Name Blocks Keys Factor Percent
---------- ---------- ------ -------- ---------- --------
JP_IDX1 P1 32 25,000 142 0
JP_IDX1 P2 62 25,000 24,826 25
JP_IDX1 P3 62 25,000 24,811 25
JP_IDX1 P4 62 25,000 24,814 25
JP_IDX2 P1 62 25,000 24,812 25
JP_IDX2 P2 32 25,000 142 0
JP_IDX2 P3 62 25,000 24,835 25
JP_IDX2 P4 62 25,000 24,835 25
JP_IDX3 P1 62 25,000 24,802 25
JP_IDX3 P2 62 25,000 24,826 25
JP_IDX3 P3 32 25,000 142 0
JP_IDX3 P4 62 25,000 24,839 25
JP_IDX4 P1 62 25,000 24,830 25
JP_IDX4 P2 62 25,000 24,832 25
JP_IDX4 P3 62 25,000 24,823 25
JP_IDX4 P4 32 25,000 142 0
JP_PK P1 31 25,000 142 0
JP_PK P2 31 25,000 142 0
JP_PK P3 31 25,000 142 0
JP_PK P4 31 25,000 142 0

20 rows selected.

From the results we can see that when the table is unpartitioned and, in this instance, ordered by the primary key, all four indexes have high clustering factors.

We could order the data by one of the columns in one of the indexes and make that index get a good (low) clustering factor, but the other indexes would still remain with high values - you can't make them all have low clustering factors whilst the table is one big amorphous mass of data.

In the partitioned table we are able to order the data in the most efficient way for each partition, insofar as each partition maps to an access path via a specific index.

So, partition P1, where users would commonly access the data via index JP_IDX1, has a low clustering factor because we ordered the data by INDEX_COL1 on insert.

Partition P2 has the lowest clustering factor for index JP_IDX2 and P3 for JP_IDX3 etc...

Some caveats to bear in mind for this approach:

1. You should ensure that your queries do use the partitioning key column when accessing the table, otherwise you'll end up scanning extra index partitions unnecessarily. You will also make life extremely difficult, if not impossible, for the optimizer because if we don't identify that the query will only use one partition then the CBO will be forced to use the global stats on the index/table rather than the specific ones for a single partition...the global ones as you can see from the results still show a high clustering factor because it is an aggregate which is hiding the detail that one specific partition - the one we really want - has a low clustering factor.

2. If you want LOCAL indexes, you will need to add the partitioning key column to that index - if it is not already present of course. As Jonathan points out in the first comment on this post...this only applies if the index in question is unique and supports a PK/unique constraint. If the index does not support a PK/unique constraint then it can be LOCAL without the need to have the partitioning key column(s) in it.

3. If you are using GLOBAL indexes on the partitioned table then they will still have reasonably high clustering factors due to the partitions where the data is not ordered by the columns of that index, but the data itself will be ordered within the partition that we access via the partioning key column so we should get good performance.

4. If you want to use ordered data in a table then you need to factor that into the way you populate the table - if it's a batch process then you can this as part of that process but if it's built up over time, in an OLTP fashion then you'll need periodic "rebuilds" of the table to take the data out and put it back in, reordered. This is something you would need to do whether it was partitioned of course.

I'm sure there are more caveats.

Your mileage may, of course, vary.

Labels:


Monday, June 18, 2007

 

Shameless advert...

OK - this posting has nothing to do with Oracle - rather it's a shameless plug for a new EBay Store that my wife has set up to sell acrylic photo frames...so if you're interested please read on, otherwise please forgive the intrusion and I'll post something Oracle related soon.

So, the story behind this little venture, was that shortly before Christmas last year, my wife, myself and our Jude went for a "free" photoshoot with a leading portrait company called Venture. We'd picked up a flyer from a local shopping centre, where they were doing a promotion and the deal offered, is that you get a one hour photoshoot and a free 6x4" framed photo of one shot from the session - no strings attached, if you'll pardon the pun. Of course, they have lots of other framing solutions to offer you, once you've had the photoshoot and they basically hope that you'll see the wonderful photos they've done and buy some of their products - pretty simple proposition really.

We did the shoot and went along to the "screening" a few weeks later where they show you all the photos they have taken, cleaned up and put into a snazzy looking powerpoint presentation which they show you in a nice little "movie theatre" setup with comfy sofas and a nice cappucino...kinda gets you in the mood and tugs at your heart strings and by the end of it all you're ready to buy just about anything they've got to offer...well some people are I guess otherwise they wouldn't have made such a successful franchise.

Now, where's the catch?

Well, in our opinion, the catch is twofold.

Firstly, that the products they offer are pretty expensive - everything is relative I guess, but some of their products such as the 4x4 ICE grid cost in excess of £2000 so unless you are Roman Abramovich you're probably going to need to sit down before you read the price list.

Secondly, once you've had your products you'd think that you get the images - in case you want a screensaver perhaps or you want to print it onto a cup for your morning coffee - you'd be wrong, as all image rights remain with Venture and unless they offer your image in one of their products and you can afford that product, then you're out of luck.

Sadly, for my wife, I didn't feel like selling a kidney to pay for some of their products - she did tell me that people can live perfectly well on just one kidney but I didn't want to risk it. We then figured that we must be able to source similar products from elsewhere on Ebay or other retailers...sadly, we couldn't find anything similar to the specific product we liked anywhere.

In the end we decided that we'd start a little business and get some products made by an acrylic manufacturer specifically for us. That's what we did and, for now at least, we'e selling a selection of these products on our Ebay Store called PhotoFrameZ.

For now we're only selling a limited amount of the items we want to offer as taking the photos and organising them onto the Ebay store is quite time consuming - we'll add more products over time.

The biggest problem we've had so far has been in taking decent photos of the products - and before anyone says, we're no photographers and yes, the images are not exactly great...but hopefully they are good enough to convince you that they are a quality product at an affordable price in comparison to the alternatives.

If anyone has any tips on how to take a good photo of a shiny acrylic object and get good detail without any reflection then I'd be interested in hearing from them please.

We've sold quite a few of the items directly through friends and family and so far feedback has been great but the Ebay store has yet to take off...mainly as we've only just set it up and also we've not really tried to advertise it much - until now!

We're only shipping to the UK currently but if it takes off then we'll consider shipping abroad as well...one step at a time. I guess, given the international nature of blogging, if you're really interested in getting one of these products you can contact me directly and I'll sort something out for you.

If you're interested in a quality photo framing solution which allows you to display your own pictures in a contemporary style and in an easy to change format, then please take a look at the store.

If you have any comments then please do get in touch with us.

Thanks for your time.

Labels:


Wednesday, June 06, 2007

 

Spot the deliberate mistake

My wife's car went into the dealership last week to have a faulty oil sensor changed...spot the deliberate mistake the engineer made...



Turned out to be quite costly...for them.

After I'd stopped ranting I must admit it did make me laugh.

Labels:


This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]