Comparison of SQL & SPL: Join Operations (Ⅱ)

In this essay, we focus on join operations with many-to-one relationship.

The many-to-one relationship exists between tables where any number of records in the first table correspond to one record in the other table. The two tables usually have a foreign key association. A certain field in table A are associated with the primary key of table B. The field in table A that links to table B’s primary key is called the foreign key pointing to table B. Table B is called table A’s foreign key table.

Suppose there are employee table and department table. The department ID field in employee table points to department table’s ID field. One employee corresponds to one department, but one department could have multiple employees. The relationship of employee table and department table is many-to-one.

The one-to-many relationship and the many-to-one relationship are different, but SQL does not distinguish them. It provides no methods to handle them separately.

The method of handling a join operation with many-to-one relationship is completely different from that of dealing with one with one-to-many relationship. We convert foreign key values into corresponding records in the other table, or attach the target field value into the table at the “many” end, to generate a combined table on which queries can be performed. SQL does not offer a particular method to handle joins with the many-to-one relationship, it uses the ordinary JOIN to do the work. SPL provides A.switch()function and other functions to specifically optimize and speed up foreign-key-type joins (many-to-one association).

【Example 1】Get names of all employees and the department they work in based on EMPLOYEE table and DEPARTMENT table. Below is part of the source data:

EMPLOYEE:

IDNAMEBIRTHDAYDEPARTMENTIDSALARY
1Rebecca1974/11/2067000
2Ashley1980/07/19211000
3Rachel1970/12/1779000
4Emily1985/03/0737000
5Ashley1975/05/13616000

DEPARTMENT:

IDNAMEMANAGER
1Administration18
2Finance2
3HR4
4Marketing6
5Production7

SQL solution:

SQL uses JOIN statement to join two tables with the many-to-one relationship according to the condition that a foreign key field value is equivalent to the corresponding primary key value in the foreign key table. Here we use the left join to search for all employees. Below are SQL statements:

SELECT
      E.NAME, D.NAME DEPT_NAME
   FROM EMPLOYEE E
   LEFT JOIN
   DEPARTMENT D
   ON E.DEPARTMENTID=D.ID

SPL solution:

SPL A.switch() function converts values of DEPARTMENTID values into corresponding records in DEPARTMENT table.

A
1=T("Employee.txt")
2=T("Department.txt")
3=A1.switch(DEPARTMENTID, A2:ID)
4=A3.new(NAME, DEPARTMENTID.NAME:DEPT_NAME)

A1: Import Employee table.
A2: Import Department table.
A3: A.switch() function objectifies foreign key values, by replacing DEPARTMENTID values with corresponding DEPARTMENT records.
A4: Return employee names and department names. The latter is obtained from NAME field of DEPARTMENT records.

The case is simple. Both SQL and SPL can handle it well. Someone may ask why we objectify the foreign key instead of referencing the foreign key field during the join? Hope you can understand the reason through the following more complicated example.

【Example 2】Find the American employees whose managers are Chinese based on EMPLOYEE table and DEPARTMENT table. Below is part of the source data:

EMPLOYEE:

IDNAMEBIRTHDAYDEPARTMENTIDSALARY
1Rebecca1974/11/2067000
2Ashley1980/07/19211000
3Rachel1970/12/1779000
4Emily1985/03/0737000
5Ashley1975/05/13616000

DEPARTMENT:

IDNAMEMANAGER
1Administration18
2Finance2
3HR4
4Marketing6
5Production7

SQL solution:

Just adding the desired foreign key field to EMPLOYEE table is not right. We need two joins instead. The first is the join between DEPARTMENT table and EMPLOYEE table through ID field for getting nationalities of managers. The second join is between EMPLOYEE table and DEPARTMENT table through DEPARTMENTID field for getting the employees’ departments. Below are SQL statements:

SELECT *
   FROM EMPLOYEE E2
   LEFT JOIN
      (SELECT D1.ID,D1.MANAGER,E1.NATION MANAGER_NATION
      FROM DEPARTMENT D1
      LEFT JOIN EMPLOYEE E1
      ON D1.MANAGER=E1.ID
   ) D2
   ON E2.DEPARTMENTID=D2.ID
   WHERE D2.MANAGER_NATION='Chinese' AND E2.NATION='American'

This block of SQL code is rather complicated. Each join requires a layer of nested query and you need to know the target fields in the foreign key table in advance.

SPL solution:

SPL handles this task also through A.switch() function by objectifying foreign key values.

A
1=T("Employee.txt").keys(ID)
2=T("Department.txt").keys(ID)
3=A2.switch(MANAGER, A1)
4=A1.switch(DEPARTMENTID, A2)
5=A4.select(NATION=="American" && DEPARTMENTID.MANAGER.NATION=="Chinese")

A1: Import Employee table and set ID as the primary key.
A2: Import Department table and set ID as the primary key.
A3: Use A.switch() function to objectify foreign key field by converting Department table’s MANAGER field into corresponding Employee records.
A4: Use A.switch() function to objectify foreign key field by converting Employee table’s DEPARTMETNID field into corresponding Department records.
A5: Get records of American employees whose managers are Chinese.

The SPL script is not so complicated as SQL statements. SPL just performs one more foreign key objectification according to the natural logic. You do not need to know which fields you want in advance for a foreign key field join because SPL allows you to get them directly from the record object. To get managers who are Chinese , SPL uses the expression DEPARTMENTID.MANAGER.NATION, which is natural.

【Example 3】Find the number of students who select “Matlab” course based on COURSE table and SELECT_COURSE table. Below is part of the source data:

COURSE:

IDNAMETEACHERID
1Environmental protection and sustainable development5
2Mental health of College Students1
3Matlab8
4Electromechanical basic practice7
5Introduction to modern life science3

SELECT_COURSE:

IDCOURSEIDSTUDENTID
1659
2643
3552
4544
5537

SQL solution:

We can use inner join to delete non-matching records (where the selected courses are not Matlab). Below are SQL statements:

SELECT
      COUNT(*) COUNT
   FROM SELECT_COURSE SC
   INNER JOIN
      COURSE C
   ON SC.COURSEID=C.ID
   WHERE NAME='Matlab'

SPL solution:

A.switch() function works with @i option to delete non-matching records during the join operation.

A
1=T("Course.csv")
2=T("SelectCourse.csv")
3=A1.select(NAME:"Matlab")
4=A2.switch@i(COURSEID, A3:ID).count()

A1: Import Course table.
A2: Import SelectCourse table.
A3: Select records containing Matlab from Course table.
A4: A.switch() works with @i option to remove non-matching record at the join and then count students who select Matlab course.

【Example 4】Get sales information of new customers in the year 2014, that is, records of SALES table where CUSTOMERIDs are not included in CUSTOMER table, based on SALES table and CUSTOMER table. Below is part of the source data:

SALES:

IDCUSTOMERIDORDERDATESELLERIDPRODUCTIDAMOUNT
10248VINET2013/7/45592440
10249TOMSP2013/7/56381863.4
10250HANAR2013/7/84651813
10251VICTE2013/7/8366670.8
10252SUPRD2013/7/94463730

CUSTOMER:

IDNAMECITYPOSTCODETEL
ALFKISanchuan Industrial Co., LtdTianjin343567(030) 30074321
ANATRSoutheast industriesTianjin234575(030) 35554729
ANTONTanson tradeShijiazhuang985060(0321) 5553932
AROUTGuoding Co., LtdShenzhen890879(0571) 45557788
BERGSTongheng machineryNanjing798089(0921) 9123465

SQL solution:

SQL can use NOT IN or NOT EXISTS to retain only the non-matching records, which are SALES records whose CUSTOMERIDs are not included in CUSTOMER table at the join. Below are SQL statements:

SELECT *
   FROM SALES S
   WHERE
      EXTRACT (YEAR FROM ORDERDATE)=2014
      AND
      CUSTOMERID NOT IN
         (SELECT DISTINCT ID
         FROM CUSTOMER)

or:

SELECT *
   FROM SALES S
   WHERE
      EXTRACT (YEAR FROM ORDERDATE)=2014
      AND
      NOT EXISTS
         (SELECT *
         FROM CUSTOMER C
         WHERE S.CUSTOMERID=C.ID)

SPL solution:

A.switch() function works with @d option to keep only the non-matching records.

A
1=T("Sales.csv")
2=T("Customer.txt")
3=A1.select(year(ORDERDATE)==2014)
4=A3.switch@d(CUSTOMERID, A2:ID)

A1: Import Sales table.
A2: Import Customer table.
A3: Select records of 2014 from Sales table.
A4: A.switch() works with @d option to keep non-matching records only, which are the sales records of new customers in the year 2014.

【Example 5】Find the superior organization for each section based on organization structure table. Below is part of the source data:

IDORG_NAMEPARENT_ID
1Head Office0
2Beijing Branch Office1
3Shanghai Branch Office1
4Chengdu Branch Office1
5Beijing R&D Center2

SQL solution:

In the table, PARENT_ID points to a value in the same table. This is a case of self-join. We query the table twice, which is equivalent to treating the source data as two tables, and then perform the join. Below are SQL statements:

SELECT
      ORG1.ID,ORG1.ORG_NAME,ORG2.ORG_NAME PARENT_NAME
   FROM
      ORGANIZATION ORG1
   LEFT JOIN
   ORGANIZATION ORG2
   ON ORG1.PARENT_ID=ORG2.ID
   ORDER BY ID

SPL solution:

SPL uses A.switch() function to objectify foreign key values, where the foreign key table is the table itself.

A
1=T("Organization.txt")
2=A1.switch(PARENT_ID, A1:ID)
3=A2.new(ID, ORG_NAME, PARENT_ID.ORG_NAME:PARENT_NAME)

A1: Import Organization table.
A2: A.switch() function objectify foreign key values by replacing PARENT_ID values with corresponding parent organization records.
A3: Return names of all organizations and their parent organizations.

【Example 6】Get customers and their order amounts in the year 2015. Below are part of data in ORDERS table, ORDER_DETAIL table and CUSTOMER table:

ORDERS:

IDCUSTOMERIDEMPLOYEEIDORDER_DATEARRIVAL_DATE
10248VINET52012/07/042012/08/01
10249TOMSP62012/07/052012/08/16
10250HANAR42012/07/082012/08/05
10251VICTE32012/07/082012/08/05
10252SUPRD42012/07/092012/08/06

ORDER_DETAIL:

IDORDER_NUMBERPRODUCTIDPRICECOUNTDISCOUNT
10814148102.080.15
10814248102.080.15
10814348306.0240.15
10814448102.080.15
10814548204.0160.15

CUSTOMER:

IDNAMECITYPOSTCODETEL
ALFKISanchuan Industrial Co., LtdTianjin343567(030) 30074321
ANATRSoutheast industriesTianjin234575(030) 35554729
ANTONTanson tradeShijiazhuang985060(0321) 5553932
AROUTGuoding Co., LtdShenzhen890879(0571) 45557788
BERGSTongheng machineryNanjing798089(0921) 9123465

SQL solution:

This task involves both one-to-many relationship (between ORDERS table and ORDER_DETAIL table) and many-to-one relationship (between ORDERS table and CUSTOEMR table). SQL does not define the two relationship separately, but it is necessary to differentiate them. First, we handle foreign key table (the many-to-one relationship) by attaching the foreign key values or desired field values to the table at the “many” end, and then we deal with the primary-and-sub tables (the one-to-many relationship). Below are SQL statements:

SELECT
      CUSTOMER_NAME, SUM(AMOUNT) AMOUNT
   FROM (
      SELECT
         Orders1.ID, CUSTOMER_NAME, Detail.PRICE*Detail.COUNT AMOUNT
      FROM (
         SELECT
            Orders.ID,Customer.NAME CUSTOMER_NAME
         FROM ORDERS Orders
         LEFT JOIN
         CUSTOMER Customer
         ON Orders.CUSTOMERID=Customer.ID
         WHERE EXTRACT (YEAR FROM ORDER_DATE)=2015
      ) Orders1
      INNER JOIN
      ORDER_DETAIL Detail
      ON Orders1.ID=Detail.ID
   )
   GROUP BY CUSTOMER_NAME
   ORDER BY CUSTOMER_NAME

SPL solution:

SPL performs foreign key objectification to replace ORDERS table’s CUSTOMERID field values with corresponding CUSTOMER records and then we have only the primary table and its sub table, which are joined using join() function.

A
1=T("Orders.txt")
2=T("Customer.txt")
3=A1.select(year(ORDER_DATE)==2015).switch(CUSTOMERID, A2:ID)
4=T("OrderDetail.txt").group(ID)
5=join(A3:Orders, ID;A4:Detail, ID)
6=A5.groups(Orders.CUSTOMERID.NAME; Detail.sum(PRICE*COUNT):AMOUNT)

A1: Import Orders table.
A2: Import Customers table.
A3: Select Orders records of the year 2015, and use A.switch() function to objectify the foreign key CUSOTMERID by converting its values to corresponding CUSTOMER records.
A4: Import OrderDetail table and group it by ID.
A5: The join() function joins Orders table and OrderDetail table by ID fields.
A6: Group A5’s result set and sum sales amounts for each customer.

【Example 7】Get order information of the year 2014 (order IDs, product names and total amounts) where the product name contains “water” and order amount is greater than 200, and that do not pay in installment and get 5-star evaluation. Below is part of the source data and the relationships between tables:

ORDERS:

IDCUSTOMERIDEMPLOYEEIDORDER_DATEARRIVAL_DATE
10248VINET52012/07/042012/08/01
10249TOMSP62012/07/052012/08/16
10250HANAR42012/07/082012/08/05
10251VICTE32012/07/082012/08/05
10252SUPRD42012/07/092012/08/06

ORDER_DETAIL:

IDORDER_NUMBERPRODUCTIDPRICECOUNTDISCOUNT
10814148102.080.15
10814248102.080.15
10814348306.0240.15
10814448102.080.15
10814548204.0160.15

ORDER_PAYMENT:

IDPAY_DATEAMOUNTCHANNELINSTALMENTS
108142014/01/05816.030
108482014/01/23800.2521
108482014/01/23800.2500
108482014/01/23800.2531
109662014/03/20572.021

EVALUATION:

IDSCOREDATECOMMENT
1024842012/07/12
1024912012/07/06
1025042012/07/10
1025122012/07/11
1025232012/07/16

PRODUCT:

IDNAMESUPPLIERIDCATEGORY
1Apple Juice21
2Milk11
3Tomato sauce12
4Salt22
5Sesame oil22

SQL solution:

This task involves one-to-many relationship, many-to-one relationship and one-to-one relationship. It is wrong to join them all with the JOIN operation because that will result in many-to-many relationship. The right way is to handle the many-to-one relationship (foreign key table) first by attaching foreign key values or desired field values to the table at the “many” end, and we have one-to-one relationship and one-to-many relationship only. Then we group the sub table by the primary table’s primary key (order ID), which makes the key the sub table’s actual primary key. Finally, we join the four tables through order ID. Below are SQL statements:

SELECT
      Orders.ID,Detail1.NAME, Detail1.AMOUNT
   FROM (
      SELECT ID
      FROM ORDERS
      WHERE
         EXTRACT (YEAR FROM Orders.ORDER_DATE)=2014
   ) Orders
   INNER JOIN (
      SELECT ID,NAME, SUM(AMOUNT) AMOUNT
      FROM (
         SELECT
            Detail.ID,Product.NAME,Detail.PRICE*Detail.COUNT AMOUNT
         FROM ORDER_DETAIL Detail
         INNER JOIN
         PRODUCT Product
         ON Detail.PRODUCTID=Product.ID
         WHERE NAME LIKE '%water%'
      )
      GROUP BY ID,NAME
   ) Detail1
   ON Orders.ID=Detail1.ID
   INNER JOIN(
      SELECT
         DISTINCT ID
      FROM ORDER_PAYMENT
      WHERE INSTALMENTS=0
   ) Payment
   ON Orders.ID = Payment.ID
   INNER JOIN(
      SELECT ID
      FROM EVALUATION
      WHERE SCORE=5
   ) Evaluation
   ON Orders.ID = Evaluation.ID

The SQL statements are difficult to write, hard to understand and maintain. More importantly, it is inconvenient to check whether the statements are correctly written since there are too many joins and nested queries.

SPL solution:

A
1=T("Orders.txt").select(year(ORDER_DATE)==2014)
2=T("Product.txt").select(like(NAME, "*water*"))
3=T("OrderDetail.txt").switch@i(PRODUCTID, A2:ID)
4=A3.group(ID).select(sum(PRICE*COUNT)>200)
5=T("OrderPayment.txt").select(INSTALMENTS==0).group(ID)
6=T("Evaluation.txt").select(SCORE==5)
7=join(A1:Orders,ID;A4:Detail,ID;A5:Payment,ID;A6:Evaluation,ID)
8=A7.new(Orders.ID,Detail.PRODUCTID.NAME,Detail.sum(PRICE*COUNT):AMOUNT)

A1: Import Orders table and select records of the year 2014.
A2: Import Product table and select records that contain water.
A3: Import OrderDetail table and objectify the foreign key PRODUCTID by replacing its values with corresponding records in Product table.
A4: Group OrderDetail table by ID field and select records where the amount is above 200.
A5: Import OrderPayment table and select records that do not have installment information.
A6: Import Evaluation table and select records containing 5-star evaluations.
A7: The join() function joins Orders table, OrderDetail table, OrderPayment table, and Evaluation table according to ID fields.
A8: Return the eligible order IDs, product names and order amounts.

The SPL script has two more lines of code. The import, select, and group operations on each table are separately performed, and only one line (A7) is for the join. The logic is natural and clear.

As seen from the above example, many-to-one relationship and one-to-many relationship is rather different. It is important for you to distinguish them even though SQL treat them in the same way. SPL, however, treats them in separate ways by supplying different functions for handling them. Now let’s take a look at the SPL solution to multi-table joins:

  1. With the many-to-one relationship (foreign key table), attach foreign key values or desired field values to the table at the “many” end.
  2. With the one -to-many relationship (primary and sub tables), group the sub table by the primary table’s primary key and the key becomes the sub table’s actual primary key.
  3. Join multiple tables through the primary keys (or the actual primary keys).

Leave a Reply