# SPL Practice: integerization during data dump

Using SPL for performance optimization, such as converting data types like string to integer during data dump, can reduce storage space and improve computing performance. This article will present how to implement integerization through a practical example.

## Problem description

The following table is the data structure of a certain space-time collision problem:

After understanding the business, we know that all the values of ‘no’ are composed of numbers and can be directly converted to Long integer; ‘lac’ and ‘ci’ always appear in pairs and can be merged as one field called ‘loc’, and the count range of ‘loc’ after de-duplication is approximately 270,000, making it possible to use ‘Int’ integer.

Since the field ‘no’ only involves comparison in subsequent calculations and does not participate in positioning calculation, it does not need to be sequence-numberized, and we just need to convert its value to integer. On the contrary, the field ‘loc’ will involve positioning calculation in subsequent calculations, so it needs to be sequence-numberized, that is, converting its values to natural numbers starting from 1.

## Data integerization and dump

The data structure after dumping is as follows:

Compared with the original data structure, the following two changes are made during data dump:

1. merge the ‘lac’ and ‘ci’ fields as one ‘loc’ field and convert their values to Int sequence number. The original ‘lac’ and ‘ci’ fields, as dimension table, are stored separately;

2. convert the data type (number string) of the ‘no’ field to Long integer.

SPL code:

A1: connect to the database; the @l option represents that the returned field name and table name are in lowercase;

A2: fetch the de-duplicated space flags from the database and create an index;

A3: save the dimension table information with space flags to a bin file;

A4: while fetching data from the database, convert the values of ‘no’ to long integer, and utilize the index to find out the sequence number corresponding to ‘loc’ and ‘ci’ from loc_list;

A5 - A9: save the ordered results of ‘no’ and ‘ct’ to a composite table.

## Restore through sequence number

Since the dumped fact table loses the information of original ‘lac’ and ‘ci’ fields, we need to restore such information by utilizing the above-mentioned bin file saved when dumping. For example, the following code is to restore the ‘lac’ and ‘ci’ values of a certain calculation result - table sequence T(no, ct, loc):