I wanted to try out this HammerOra product from Steve Shaw, both at work and on my box at home…but after playing with it at home, I realised that it takes quite some time to build even a small (scale factor 1) TPC-H schema…I know it runs serially, but I’m still not quite sure why it’s that slow (on my system that is), but Steve does say it can take a while and that you might wish to consider using the TPC utility DBGEN to generate and load the schema quicker…particularly if you use some manual parallelisation.
Given that I also need to use this tool to help with some benchmarking at work, I decided to try to get DBGEN to run on a HP-UX box today and had one or two problems which I managed to sort out. The machine in question is an RP8420 running HP-UX B.11.11.
DBGEN is a utility that allows you to create a series of flat files which contain the data for a TPC-H schema. You can then use SQL*Loader to load these into appropriately constructed tables in an Oracle database – any database actually…but I only care about Oracle of course 😉
The utility can be called with various parameters including making the target datasets in smaller “child” files which can be created in a manually parallelised fashion to speed the whole process up. You have to download the DBGEN reference data set from the TPC website (lower right hand side).
This reference data set contains the ANSI C source code which makes the DBGEN executable (and QGEN also…but more on that another day)…unfortunately it’s just the source code, so that means you need to compile it yourself…which of course leads to the first problem…that I know diddly squat about C…yeah I know, not very manly! Luckily I can sometimes follow instructions (which come with the reference data set)…
1. Copy makefile.suite to makefile
2. Edit makefile and make the following amendments (in red):
## CHANGE NAME OF ANSI COMPILER HERE
CC = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
# SQLSERVER, SYBASE
# Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS,
# SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are: TPCH
MACHINE = HP
WORKLOAD = TPCH
That’s it for the makefile.
Now, as I mentioned, the ORACLE database is not a listed database variant in the DBGEN C code – it’s got all the other popular RDBMS which I find quite bizarre…I’m sure there’s a reason, but I can’t think of one. To get around this, as per Chapter 5 in “Pro Oracle Database 10g RAC on Linux” by Steve Shaw and Julian Dyke, I added a section to the tpcd.h for the ORACLE database:
#define GEN_QUERY_PLAN “”
#define START_TRAN “”
#define END_TRAN “”
#define SET_OUTPUT “”
#define SET_ROWCOUNT “”
#define SET_DBASE “”
I thought that was it – but it still would’t compile, giving the error:
config.h:213:2: #error Support for a 64-bit datatype is required in this release
Looking at the config.h – and bearing in mind I’m no C programmer – it struck me as odd that all bar the HP machine section, had stuff about DSS_HUGE and 64 bits…so I took a punt and copied some lines (in red) from the IBM section into the HP one to see if it worked…and it did. The HP section now looks like this:
#define DSS_HUGE long long
#define HUGE_FORMAT “%lld”
#define HUGE_DATE_FORMAT “%02lld”
#define RNG_A 6364136223846793005ull
#define RNG_C 1ull
#endif /* HP */
Typing make at the command prompt then compiles the code and produces the dbgen executable…which I then spent a few hours playing with to create a scale factor 1 TPC-H set of files.
My next problem was one of my own making really in that I copied the CREATE TABLE statements for the TPC-H target tables from HammerOra’ TCL script for TPC-H creation, but unfortunately, the column ordering is slightly different in those DDL statements as compared to the DBGEN output files…which meant that I created the tables OK, but since I’d copied the column ordering to make the SQL*Loader control files, I got errors when I tried to load some of the files as the column order is different in one or two cases.
I then downloaded the TPC-H specification document which has, amongst other things, the data model, from which I cross checked the column ordering of the data model against the columns in the output files and then managed to rerun the data in without any further issues.
Tomorrow I’ll try running HammerOra against the target TPC-H schema and make some shell scripts to try and automate most of the process so we can build different scale factor schemae and do so in a manually parallelised fashion – scripts the amiable Scotsman created for his parallel testing a while back should give me a good start with that.