Comparison of SQL & SPL: Dynamic Transposition

Dynamic row-to-column transposition cannot specify to-be-generated fields in advance but can only define them on an ad hoc basis according to the original fields.

1. Automatically generate a column according to field values

【Example 1】Based on the following employee table, calculate the average salary in different regions for each department. Below is part of the source table:

IDNAMESURNAMESTATEDEPTSALARY
1RebeccaMooreCaliforniaR&D7000
2AshleyWilsonNew YorkFinance11000
3RachelJohnsonNew MexicoSales9000
4EmilySmithTexasHR7000
5AshleySmithTexasR&D16000

Below is the result of the expected layout:

DEPTCaliforniaColoradoFlorida
Administration9333.333
Finance8000500010000
HR100007000

SQL solution:

Field names in the result set are extracted from STATE field values in the original table. As SQL does not allow using a non-constant expression as the PIVOT/UNPIVOT value, PIVOT cannot be used to handle the dynamic row-to-column transposition. SQL also does not support returning a dynamic data structure for achieving the dynamic row-to-column transposition.

To deal with this task using the database, we can turn to the stored procedure to create a dynamic SQL statement. Since what we focus on is the comparison of SQL and SPL, we just skip the stored procedure solution.

SPL solution:

This row-to-column transposition gets its target fields from the original table. SPL offers A.pivot() function to do this. When no target fields are specified, the function will automatically extract names for the target fields.

A
1=T("Employee.csv")
2=A1.groups(STATE,DEPT;avg(SALARY):AVG_SALARY)
3=A2.pivot(DEPT; STATE, AVG_SALARY)

A1: Import the Employee table.
A2: Group the table by STATE and DEPT to calculate the average salary in each state for each department.
A3: A.pivot() function is used to perform the row-to-column transposition. It will automatically extract names for the target fields if they are not specified.

2. Dynamically generate column names through calculation

【Example 2】Based on the following income details table, get the income information of each employee with categories generated automatically. Below is part of the source table:

NAMESOURCEINCOME
DavidSalary8000
DavidBonus15000
DanielSalary9000
AndrewShares26000
AndrewSales23000

Below is the result of the expected layout:

NAMESOURCE1INCOME1SOURCE2INCOME2
AndrewShares26000Sales23000
DanielSalary9000
DavidSalary8000Bonus15000
RobertBonus13000

SQLsolution:

In this case, column names need to be dynamically generated through calculation according to the original field values. The number of columns, even the column names, cannot be defined in advance. SQL cannot implement this type of dynamic row-to-column transposition.

SPLsolution:

The target fields are not extracted from a specific field dynamically but need to be calculated dynamically. Here A.pivot() function is not suitable. We can first generate the target data structure according to the group with the most income categories and then populate data into it.

A
1=T("Income.txt").group(NAME)
2=A1.max(~.len())
3=create(NAME, ${A2.("SOURCE"/~/", INCOME"/~).concat@c()})
4>A1.(A3.record(~.NAME | ~.conj([SOURCE, INCOME])))

A1: Import the Income table and group it by name.
A2: Get the largest number of members among the groups, which is the largest number of income categories.
A3: Create an empty table by generating column names dynamically according to A3’s result.
A4: Loop through each group to populate the names, income sources, and amounts to A3’s empty table.

3. Dynamic row-to-column transposition through table join

【Example 3】Based on the following ORDERS table, ORDER_DETAIL table, and PRODUCT table, generate a table recording information of product purchased each day by each customer in the year 2014. The relationship between ORDERS table and ORDER_DETAIL table is one to many. Each order corresponds to multiple order detail records. The relationship between ORDER_DETAIL table and PRODUCT table is many to one. The ID field of ORDER_DETAIL table points to the ID field of PRODUCT table. Below are parts of the three tables:

ORDERS:

IDCUSTOMERIDEMPLOYEEIDORDER_DATEARRIVAL_DATE
10248VINET52012/07/042012/08/01
10249TOMSP62012/07/052012/08/16
10250HANAR42012/07/082012/08/05
10251VICTE32012/07/082012/08/05
10252SUPRD42012/07/092012/08/06

ORDER_DETAIL:

IDORDER_NUMBERPRODUCTIDPRICECOUNTDISCOUNT
10814148102.080.15
10814248102.080.15
10814348306.0240.15
10814448102.080.15
10814548204.0160.15

PRODUCT:

IDNAMESUPPLIERIDCATEGORY
1Apple Juice21
2Milk11
3Tomato sauce12
4Salt22
5Sesame oil22

Below is result of the expected layout

ORDER_DATECUSTOMERIDPRODUCT1COUNT1PRODUCT2COUNT2
2014/1/5VICTECorn flakes8
2014/1/23CONSHChicken3Cake9

SQLsolution:

SQL cannot implement this type of dynamic row-to-column transposition, too.

SPL solution:

Join up the three source tables through their associated relationships and do the subsequent calculations using the methods for handling the previous two examples.

A
1=T("Orders.txt").select(year(ORDER_DATE)==2014)
2=T("Product.txt")
3=T("OrderDetail.txt").switch(PRODUCTID,A2:ID).group(ID)
4=join(A1:ORDERS,ID;A3:DETAIL,ID)
5=create(DATE,CUSTOMERID,${A4.max(DETAIL.len()).("PRODUCT"/~/","/"COUNT"/~).concat@c()})
6>A4.run(A5.record([ORDERS.ORDER_DATE,ORDERS.CUSTOMERID]|DETAIL.([PRODUCTID.NAME,COUNT]).conj()))

A1: Import Orders table and select records of the year 2014.
A2: Import Product table.
A3: Import OrderDetail table and join it with Product table through PRODUCTID field; replace PRODCUTID field values with corresponding Product records; and then group the joining result by order ID.
A4: Join up Orders table with OrderDetail table through order ID.
A5: Generate target data structure according to the group containing the most OrderDetail records and create an empty table.
A6: Loop through A4’s table to populate data into A5’s empty table in turn.

Actually, this case is essentially the same as the previous two cases. It just has one more step to join up the three tables together. The steps after that are the same.

Summary

Through the three essays about the comparison of SQL and SPL in handling transposition, we can see that SQL’s static transposition methods PIVOT and UNPIVOT have limited applications and are supported only by certain database products. The SQL query will often be too complicated when it tries to handle certain complex static transposition scenarios. Moreover, SQL lacks a standard method of dealing with them. For dynamic transpositions, SQL just cannot achieve them. Usually, it turns to the more roundabout way, like the stored procedure, to generate the dynamic SQL query.

SPL provides flexible and adaptable method for handling various complicated transposition tasks. More importantly, the language has a clear and stable logic. It will first create the target data structure and then populated the calculated result to the table.

When query is complex, SQL statement becomes even more complicated. Temporary tables and nested queries are methods it often uses. This makes it hard to write and maintain the SQL statement. SPL, by contrast, can write the logic according to the natural way of thinking and generate concise code step by step.

The SPL-driven esProc is a professional data computation engine. It is based on ordered sets and provides a complete set of set-oriented operations, making it the combination of JAVA advantages and SQL merits. Transpositions become easy and simple with SPL. 

Leave a Reply