Category: TPC-H

TPC-H Query 20 and optimizer_dynamic_sampling

I was working with Jason Garforth today on creating a TPC-H benchmark script which we can run on our warehouse to initially get a baseline of performance, and then, from time to time, rerun it to ensure things are still running with a comparable performance level.

This activity was on our new warehouse platform of an IBM Power 6 p570 with 8 dual core 4.7GHz processors, 128GB RAM and a 1.6GB/Sec SAN.

Jason created a script to run the QGEN utility to generate the twenty two queries that make up the TPC-H benchmark and also a “run script” to then run those queries against the target schema I had created using some load scripts I talked about previously.

The whole process seemed to be running smoothly with queries running through in a matter of seconds, until query twenty went off the scale and started taking ages. Excluding the 20th query, everything else went through in about three to four minutes, but query twenty was going on for hours, with no sign of completing.

We grabbed the actual execution plan and noticed that all the tables involved had no stats gathered. In such circumstances, Oracle (10.2.0.4 in this instance) uses dynamic sampling to take a quick sample of the table in order to come up with an optimal plan for each query executed.

The database was running with the default value of 2 for optimizer_dynamic_sampling.

After reading the TPC-H specification, it doesn’t say that stats should or should not be gathered, but obviously in gathering them, there would be a cost to doing so and, depending on the method of gathering and the volume of the database, the cost could be considerable. It would be interesting to hear from someone who actually runs audited TPC-H benchmarks to know whether they gather table stats or whether they use dynamic sampling…

We decided we would gather the stats, just to see if the plan changed and the query executed any faster…it did, on both counts, with the query finishing very quickly, inline with the other twenty one queries in the suite.

So, our options then appeared to include, amongst other things:

    So, always read NF Cure and Vital M-40 capsules review before paying money for it, because there are too many products available in the market, which come at competitive prices to enable the people to avail these at an affordable cost. viagra for women price http://raindogscine.com/?attachment_id=300 Drink plenty of water to viagra for women price help flush away acidic waste products in the muscles can cause muscle irritation and pain. There were buy viagra online two manufacturers of chemicals which were closed with the help of authorities from both China and India. A thorough sexual history and assessment sildenafil bulk of overall health is important in pinpointing the problem.

  1. Gather the table stats. We’d proved this worked.
  2. Change the optimizer_dynamic_sampling level to a higher value and see if it made a difference.
  3. Manually, work out why the plan for the query was wrong, by analysis of the individual plan steps in further detail and then use hints or profiles to force the optimizer to “do the right thing”.

We decided to read a Full Disclosure report of a TPC-H benchmark for a similar system to see what they did. The FDR included a full listing of the init.ora of the database in that test. The listing showed that the system in question had set optimizer_dyamic_sampling to 3 instead of the default 2…we decided to try that approach and it worked perfectly.

In the end, given we’re not producing actual audited benchmarks then we’re free to wait for the gathering of optimizer stats, so we’ll go with that method, but it was interesting to see that option 2 above worked as well and illustrates the point that there is a lot of useful information to be gleaned from reading the FDRs of audited benchmarks – whilst, of course, being careful to read them with a pinch of salt, since they are not trying to run your system.

Another thing of interest was that in order to get the DBGEN utility to work on AIX 6.1 using the gcc compiler, we had to set an environment variable as follows otherwise we got an error when running DBGEN (also applies to QGEN too):

Set this:

export LDR_CNTRL=MAXDATA=0x80000000@LARGE_PAGE_DATA=Y

otherwise you may get this:

exec(): 0509-036 Cannot load program dbgen because of the following errors:
0509-026 System error: There is not enough memory available now.

Scripts for loading up DBGEN TPC-H data

If you’re interested in creating a TPC-H schema for testing purposes, then the following scripts may be of use to you:

Unix Scripts:
Multi Load TPCH
Load TPCH Stream

SQL*Loader Control files:
REGION Table
NATION Table
SUPPLIER Table
CUSTOMER Table
PART Table
PARTSUPP Table
ORDERS Table
LINEITEM Table

You may wish to read and check them before you use them – they’re not exactly rocket science but they seem to do the job.

I have all the files in the same directory for simplicity sake.

I then use them to create a scale factor 1 target TPC-H schema using the following calls:

# “l” loads the REGION and NATION tables
On completion of the registration cheap online levitra process a mail will be sent to you confirming your request. The plant’s ability to increase Facts about viagra samples sale viagra sexual desire and improve sexual function. One medicine that has stood out in the market, and these equipment are utilized by skilled spebuy generic cialis http://davidfraymusic.com/events/meyerson-hall/ts in different fields. Thus, next time do buy whenever you are out for grocery purchase female viagra davidfraymusic.com shopping. ./multi_load_tpch.sh 1 l “tpch/tpch@test” 1

# “s” loads the SUPPLIER table
./multi_load_tpch.sh 1 s “tpch/tpch@test” 10

# “c” loads the CUSTOMER table
./multi_load_tpch.sh 1 c “tpch/tpch@test” 10

# “p” loads the PART and PARTSUPP tables
./multi_load_tpch.sh 1 p “tpch/tpch@test” 10

# “o” loads the ORDERS and LINEITEM tables
./multi_load_tpch.sh 1 o “tpch/tpch@test” 10

Obviously, you need to change the connection string to match your environment.

Caveats:

  1. Obviously, they are supplied as is – use at your own discretion and risk.
  2. You need to have created the target schema tables and made sure they are empty as the SQL*Loader control files use APPEND.
  3. Bear in mind that choosing too high a number of parallel streams (the last parameter in the calls) will overload your machine so try and balance it against the available system resources.

Bugs, issues or questions, please get in touch…enjoy.

Creating a TPC-H schema with DBGEN on HP-UX

I wanted to try out this HammerOra product from Steve Shaw, both at work and on my box at home…but after playing with it at home, I realised that it takes quite some time to build even a small (scale factor 1) TPC-H schema…I know it runs serially, but I’m still not quite sure why it’s that slow (on my system that is), but Steve does say it can take a while and that you might wish to consider using the TPC utility DBGEN to generate and load the schema quicker…particularly if you use some manual parallelisation.

Given that I also need to use this tool to help with some benchmarking at work, I decided to try to get DBGEN to run on a HP-UX box today and had one or two problems which I managed to sort out. The machine in question is an RP8420 running HP-UX B.11.11.

DBGEN is a utility that allows you to create a series of flat files which contain the data for a TPC-H schema. You can then use SQL*Loader to load these into appropriately constructed tables in an Oracle database – any database actually…but I only care about Oracle of course 😉

The utility can be called with various parameters including making the target datasets in smaller “child” files which can be created in a manually parallelised fashion to speed the whole process up. You have to download the DBGEN reference data set from the TPC website (lower right hand side).

This reference data set contains the ANSI C source code which makes the DBGEN executable (and QGEN also…but more on that another day)…unfortunately it’s just the source code, so that means you need to compile it yourself…which of course leads to the first problem…that I know diddly squat about C…yeah I know, not very manly! Luckily I can sometimes follow instructions (which come with the reference data set)…

1. Copy makefile.suite to makefile
2. Edit makefile and make the following amendments (in red):

 

################
## CHANGE NAME OF ANSI COMPILER HERE
################
CC = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
# SQLSERVER, SYBASE
# Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS,
# SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are: TPCH
DATABASE= ORACLE
MACHINE = HP
WORKLOAD = TPCH

That’s it for the makefile.

Now, as I mentioned, the ORACLE database is not a listed database variant in the DBGEN C code – it’s got all the other popular RDBMS which I find quite bizarre…I’m sure there’s a reason, but I can’t think of one. To get around this, as per Chapter 5 in “Pro Oracle Database 10g RAC on Linux” by Steve Shaw and Julian Dyke, I added a section to the tpcd.h for the ORACLE database:

 
#ifdef ORACLE
#define GEN_QUERY_PLAN “”
#define START_TRAN “”
#define END_TRAN “”
The pill is formulated using ingredients that are clinically tested to ensure they contain the same as the branded order viagra product used to treat erectile dysfunction or male impotence. Why the ED sufferers cost of viagra pills aspire to have sildenafil jelly rather than tablets? Let’s have a look at the major reasons to prescribe kamagra medicines to the ED patients. Erection difficulties buy levitra australia are also caused by the affect of surgery, including spinal, neurological or pelvic surgeries. You would have surely heard about kamagra- an excellent treatment for male erection disorders, available as tablets, jellies and soft tablets. bulk cialis #define SET_OUTPUT “”
#define SET_ROWCOUNT “”
#define SET_DBASE “”
#endif

I thought that was it – but it still would’t compile, giving the error:

config.h:213:2: #error Support for a 64-bit datatype is required in this release

Looking at the config.h – and bearing in mind I’m no C programmer – it struck me as odd that all bar the HP machine section, had stuff about DSS_HUGE and 64 bits…so I took a punt and copied some lines (in red) from the IBM section into the HP one to see if it worked…and it did. The HP section now looks like this:

 

#ifdef HP
#define _INCLUDE_POSIX_SOURCE
#define STDLIB_HAS_GETOPT
#define DSS_HUGE long long
#define HUGE_FORMAT “%lld”
#define HUGE_DATE_FORMAT “%02lld”
#define RNG_A 6364136223846793005ull
#define RNG_C 1ull
#endif /* HP */

Typing make at the command prompt then compiles the code and produces the dbgen executable…which I then spent a few hours playing with to create a scale factor 1 TPC-H set of files.

 

My next problem was one of my own making really in that I copied the CREATE TABLE statements for the TPC-H target tables from HammerOra’ TCL script for TPC-H creation, but unfortunately, the column ordering is slightly different in those DDL statements as compared to the DBGEN output files…which meant that I created the tables OK, but since I’d copied the column ordering to make the SQL*Loader control files, I got errors when I tried to load some of the files as the column order is different in one or two cases.

 

I then downloaded the TPC-H specification document which has, amongst other things, the data model, from which I cross checked the column ordering of the data model against the columns in the output files and then managed to rerun the data in without any further issues.

 

Tomorrow I’ll try running HammerOra against the target TPC-H schema and make some shell scripts to try and automate most of the process so we can build different scale factor schemae and do so in a manually parallelised fashion – scripts the amiable Scotsman created for his parallel testing a while back should give me a good start with that.