Comparison of SQL & SPL: Static Transposition

A static transposition can define the data structure after data is transposed beforehand and the structure does not change accordingly as data is changed.

1. Row-to-column transposition

As the name shows, row-to-column transposition aims to convert row values into column names to transfer rows to columns. In practice, PIVOT is often used after the grouping and aggregation, which gets unique values from the row values under the to-be-transposed columns and then arrange the row values into column names. The role of PIVOT is to group and put the aggregates in a specific column into multiple columns for an intuitive representation.

【Example 1】Based on the following scores table, find the highest score of each subject in each class and present the results in columns. Below is part of the source table:

CLASSSTUDENTIDSUBJECTSCORE
11English84
11Math77
11PE69
12English81
12Math80

Below is the result of expected layout:

CLASSMAX_MATHMAX_ENGLISHMAX_PE
1979697
2979697

SQL solution:

SQL PIVOT is used to achieve row-to-column transposition and column-to-row transposition. But only the relatively new versions of certain database products support this method. ORACLE, for instance, supports it from the 11g version, and some databases, such as MYSQL, still uses the subquery to do the grouping and aggregation and then a left join to achieve the transposition and have not given any support to it. In this case, we use ORACLE 11g to write the SQL query:

SELECT *
   FROM (
      SELECT
         CLASS, SUBJECT, SCORE
      FROM SCORES
   )
   PIVOT (
      MAX(SCORE) FOR SUBJECT
      IN (
         'Math' AS MAX_MATH,
         'English' AS MAX_ENGLISH,
         'PE' AS MAX_PE
      )
   )

SPL solution:

SPL offers A.pivot() function to perform transposition, row-to-column by default.

A
1=T("Scores.csv")
2=A1.groups(CLASS,SUBJECT; max(SCORE):MAX_SCORE)
3=A2.pivot(CLASS; SUBJECT, MAX_SCORE; "Math":"MAX_MATH", "English":"MAX_ENGLISH", "PE":"MAX_PE")

A1: Import Scores table from the source file.
A2: Group A1’s table by class and subject and calculate the highest score of each subject in each class.
A3: A.pivot() function transposes the row-wise highest scores into column names.

SPL supports retrieving a data table from the database, too. A1 in the above script can be modified as:

A
1=connect("db").query("SELECT * FROM SCORES")

2. Column-to-row transposition

Contrary to row-to-column transposition, the column-to-row transposition converts each field into a new row, where the new field value comes from the original field name or alias, and transforms the original field values into field values of a new field.

【Example 2】Based on the following Olympic medal table, generate a new table recording information for each type of medal. Below is part of the source table:

GameNationGoldSilverCopper
30USA462929
30China382723
30UK291719
30Russia242632
30Korea1387

Below is the result of expected layout:

GAMENATIONMEDAL_TYPEMEDALS
30USAGOLD46
30USASILVER29
30USACOPPER29
30ChinaGOLD38
30ChinaSILVER27

SQLsolution:

SPL uses UNPIVOT function to perform a column-to-row transposition:

   SELECT *
   FROM OLYMPIC
   UNPIVOT (
      MEDALS FOR MEDAL_TYPE IN (
          GOLD,SILVER,COPPER
      )
   )

SPLsolution:

SPL A.pivot() function works with @r option to perform a column-to-row transposition:

A
1=T("Olympic.txt")
2=A1.pivot@r(GAME,NATION; MEDAL_TYPE, MEDALS; GOLD, SILVER, COPPER)

A1: Import Olympic medal table.
A2: A.pivot@r() function transposes columns GOLD, SILVER and COPPER into new rows.

3. Scenarios containing both types of transpositions

【Example 3】Based on the following channel-based sales table, generate a new table storing information by date. Below is part of the source table:

YEARMONTHONLINESTORE
2020124403746.2
202021863.4448.0
202031813.0624.8
20204670.82464.8
202053730.0724.5

Below is the result of expected layout:

CATEGORY123
ONLINE24401863.41813.0
STORE3746.2448.0624.8

SQL solution:

We need to perform both row-to-column transposition and column-to-row transposition to get this done. First, we perform column-to-row transposition to transform channel types into values under CATEGORY field:

YEARMONTHCATEGORYAMOUNT
20201ONLINE2440
20201STORE3746.2
20202ONLINE1863.4
20202STORE448.0

Then we perform row-to-column transposition to convert MONTH values into column names. The complete SQL query is as follows:

SELECT *
   FROM (
      SELECT *
      FROM MONTH_SALES
      UNPIVOT (
         AMOUNT FOR CATEGORY IN (
            "ONLINE",STORE
         )
      )
      WHERE YEAR=2020
   )
   PIVOT (
      MAX(AMOUNT) FOR MONTH
      IN (
         1 AS "1",2 AS "2",2 AS "3",
         4 AS "4",5 AS "5",6 AS "6",
         7 AS "7",8 AS "8",9 AS "9",
         10 AS "10",11 AS "11",12 AS "12"
      )
   )

SPL does not support using a non-constant expression as PIVOT/UNPIVOT value, so all months need to be enumerated for the row-to-column transposition.

SPL solution:

According to the natural logic, SPL handles the task using A.pivot@r() and A.pivot() respectively for column-to-row transposition and row-to-column transposition:

A
1=T("MonthSales.csv").select(YEAR:2020)
2=A1.pivot@r(YEAR,MONTH; CATEGORY, AMOUNT)
3=A2.pivot(CATEGORY; MONTH, AMOUNT)

A1: Import MonthSales table and select records of the year 2020.
A2: A.pivot@r() performs column-to-row transposition to convert channel types into values of CATEGORY field.
A3: A.pivot () performs row-to-column transposition to transform MONTH field values into column names.

Both SQL and SPL handle simple transposition scenarios well. The issue is that the real-world situations cannot always dealt with using the mode of grouping & aggregation plus PIVOT. In the subsequent essays in our transposition series, we will introduce how the two languages handle complicated static transpositions and dynamic transpositions.

Leave a Reply