K-Tables and Indexes

K tables and indexes are a concept we use with our Windows/SQL Server product. The concept does not apply to the i-Series/OS400 products. The  SQL Server have limits on the number of columns that can be included in an index, so it is not always possible to create an index over all the dimensions in a DSL. That’s where the K concept comes into play.

The K table actually combined multiple dimension values into a single string within a single column.  Thus, with all the values in a single column, it was no problem building an index over it.  However, we improved on this concept with the introduction of the “gidropky” program.  Now, instead of having a table of concatenated values, we build a specific index over the header table (a “K” index).  The K index dramatically improves performance over the old K table method.  The old K table method would concatenate all defined dimensions into a single column, resulting in a large table.  Due to the K table’s large size, only a limited number of K table records could be actively contained in memory.  The K index method, however, uses only select dimensions (i.e. those dimensions that uniquely identify the data) to define the index.  The resulting index is considerably smaller than the K table.  Thus, a larger number of K-index records can be contained in the same amount of memory, thereby improving SQL performance.

We therefore recommend that all of our Windows and UNIX customers run “gidropky” to create K indexes and drop the K tables prior to loading data. The “gidropky” only needs to be run once or after any dimensions changes has occurred.  If K- indexes are not unique, this will cause slower performance with loads and ADD DSL processes.

Note: The ADDDSL process (gislsdrv) will create K-indexes if you submit ADDDSL from server.  Processing ADDDSL from the client Administration will not create K- indexes. To process (gislsdrv) from server enter the following parameters from DOS: GISLSDRV /DS=<datasource name> /LT=5 /YEAR=<enter year> /AA= <DSL name> .

For more information please review server readme doc regarding gislsdrv command.
The important step of building the K index is choosing which dimensions to include.  Since the indexes have size limits, you need to pick and choose which dimensions to include.   You must determine which dimensions uniquely define the data and then include ONLY those dimensions in the K index.  The dimensions must be those defined within the structure code.  Example, if three of our dimensions are zip code, city, and state, you can probably get away with building the index over just zip code.  If you know the zip code, the city and state can be derived.  Note that we stated have stated PROBABLY get away with this.  If data is not consistently populated with zip codes, then zip code may not be able to uniquely define city and state.  That determination can only be made with a real knowledge of one’s data.  This list of dimensions is then fed into the “gidropky” program as a unique dimension string (the /UD parameter of the gidropky program).

A sample gidropky command might look like this:

Gidropky /DS=DSAA /LF=/silvon/dtdata/dropky.log /UD="6(1,2,3,4,5,7),8,9,13(10,11(12),14,15),16"

In this example, we’re running gidropky against the “DSAA” structure code and creating a log file called dropky.log.  The unique dimension string is in the  /UD parameter.  In this example, dimension 6 uniquely defines dimensions 1, 2, 3, 4, 5, 6, and 7.  Dimensions 8 and 9 don’t uniquely define any other dimensions, but they do help uniquely define the record.  Dimension 13 uniquely defines dimensions 10, 11, 12, 13, 14, and 15.  Notice that you can have nested levels of “uniqueness” – in this example dimension 13 uniquely defines dimension 11, which in turn uniquely defines dimension 12.  The K index is built over just the most granular dimensions, so in this example the K index would be built over dimensions 6, 8, 9, 13, and 16.

Please note that being able to choose a good unique dimension string for your K index requires that you actually know and understand your data.