Creating a TPC-H schema with DBGEN on HP-UX

Jul 21, 2008 Benchmarking, TPC-H

I wanted to try out this HammerOra product from Steve Shaw, both at work and on my box at home…but after playing with it at home, I realised that it takes quite some time to build even a small (scale factor 1) TPC-H schema…I know it runs serially, but I’m still not quite sure why it’s that slow (on my system that is), but Steve does say it can take a while and that you might wish to consider using the TPC utility DBGEN to generate and load the schema quicker…particularly if you use some manual parallelisation.

Given that I also need to use this tool to help with some benchmarking at work, I decided to try to get DBGEN to run on a HP-UX box today and had one or two problems which I managed to sort out. The machine in question is an RP8420 running HP-UX B.11.11.

DBGEN is a utility that allows you to create a series of flat files which contain the data for a TPC-H schema. You can then use SQL*Loader to load these into appropriately constructed tables in an Oracle database – any database actually…but I only care about Oracle of course 😉

The utility can be called with various parameters including making the target datasets in smaller “child” files which can be created in a manually parallelised fashion to speed the whole process up. You have to download the DBGEN reference data set from the TPC website (lower right hand side).

This reference data set contains the ANSI C source code which makes the DBGEN executable (and QGEN also…but more on that another day)…unfortunately it’s just the source code, so that means you need to compile it yourself…which of course leads to the first problem…that I know diddly squat about C…yeah I know, not very manly! Luckily I can sometimes follow instructions (which come with the reference data set)…

1. Copy makefile.suite to makefile
2. Edit makefile and make the following amendments (in red):

 

################
## CHANGE NAME OF ANSI COMPILER HERE
################
CC = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
# SQLSERVER, SYBASE
# Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS,
# SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are: TPCH
DATABASE= ORACLE
MACHINE = HP
WORKLOAD = TPCH

That’s it for the makefile.

Now, as I mentioned, the ORACLE database is not a listed database variant in the DBGEN C code – it’s got all the other popular RDBMS which I find quite bizarre…I’m sure there’s a reason, but I can’t think of one. To get around this, as per Chapter 5 in “Pro Oracle Database 10g RAC on Linux” by Steve Shaw and Julian Dyke, I added a section to the tpcd.h for the ORACLE database:

 
#ifdef ORACLE
#define GEN_QUERY_PLAN “”
#define START_TRAN “”
#define END_TRAN “”
The pill is formulated using ingredients that are clinically tested to ensure they contain the same as the branded order viagra product used to treat erectile dysfunction or male impotence. Why the ED sufferers cost of viagra pills aspire to have sildenafil jelly rather than tablets? Let’s have a look at the major reasons to prescribe kamagra medicines to the ED patients. Erection difficulties buy levitra australia are also caused by the affect of surgery, including spinal, neurological or pelvic surgeries. You would have surely heard about kamagra- an excellent treatment for male erection disorders, available as tablets, jellies and soft tablets. bulk cialis #define SET_OUTPUT “”
#define SET_ROWCOUNT “”
#define SET_DBASE “”
#endif

I thought that was it – but it still would’t compile, giving the error:

config.h:213:2: #error Support for a 64-bit datatype is required in this release

Looking at the config.h – and bearing in mind I’m no C programmer – it struck me as odd that all bar the HP machine section, had stuff about DSS_HUGE and 64 bits…so I took a punt and copied some lines (in red) from the IBM section into the HP one to see if it worked…and it did. The HP section now looks like this:

 

#ifdef HP
#define _INCLUDE_POSIX_SOURCE
#define STDLIB_HAS_GETOPT
#define DSS_HUGE long long
#define HUGE_FORMAT “%lld”
#define HUGE_DATE_FORMAT “%02lld”
#define RNG_A 6364136223846793005ull
#define RNG_C 1ull
#endif /* HP */

Typing make at the command prompt then compiles the code and produces the dbgen executable…which I then spent a few hours playing with to create a scale factor 1 TPC-H set of files.

 

My next problem was one of my own making really in that I copied the CREATE TABLE statements for the TPC-H target tables from HammerOra’ TCL script for TPC-H creation, but unfortunately, the column ordering is slightly different in those DDL statements as compared to the DBGEN output files…which meant that I created the tables OK, but since I’d copied the column ordering to make the SQL*Loader control files, I got errors when I tried to load some of the files as the column order is different in one or two cases.

 

I then downloaded the TPC-H specification document which has, amongst other things, the data model, from which I cross checked the column ordering of the data model against the columns in the output files and then managed to rerun the data in without any further issues.

 

Tomorrow I’ll try running HammerOra against the target TPC-H schema and make some shell scripts to try and automate most of the process so we can build different scale factor schemae and do so in a manually parallelised fashion – scripts the amiable Scotsman created for his parallel testing a while back should give me a good start with that.

 

By Jeff

One thought on “Creating a TPC-H schema with DBGEN on HP-UX”
  1. I’m trying to produce flat files on linux using dbgen tool but it is giving me an error i.e dbgen: command not found. I’m not getting why the error is coming. Can you please assist me?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.