Comparison of SQL & SPL: Join Operations (Ⅰ)

The independent relationship between data of tables is called table association. There are four types of table relationships – one-to-one, many-to-one, one-to-many and many-to-many. Two or more tables can be joined through the associative relationship for performing an associative query on them.

Ⅰ. One-to-one relationship

The one-to-one relationship exists between two tables where a record in a table corresponds to only one record in another table; and vice versa, and usually, that use the same primary key fields.

Suppose there are a student table and student_contacts table. Both use student ID as the primary key. The former stores student names, genders, birthdays, departments, and classes. The latter records contacts of students and their addresses. Each student record corresponds to a piece of contact information, and each contact corresponds to one student.

【Example 1】Based on STUDENT table and STUDENT_CONTACTS table, get names of students and their contacts and contacts’ addresses for students who have the contact information. Below is part of the source data:

STUDENT:

IDNAMEDEPARTMENTIDCLASSIDGENDERBIRTHDAY
1Rebecca11F2010/09/08
2Ashley11F2010/10/09
3Rachel11F2011/04/29
4Emily11F2010/11/24
5Ashley11F2011/03/03

STUDENT_CONTACTS:

IDCONTACTSADDRESS
1Mrs. Moore124 Guangming North Road
2Mrs. Wilson116 Baishi Road
3Mr. JohnsonNo.8, Mingcheng Road, Haidian District
4Mr. Smith12 Fuxing Road
5Mr. Smith462 Shijingshan Road

SQL solution:

There could be some students who do not have contacts information. To find names and contact information of target students, we use the inner join to select records from STUDENT table where IDs exist in both tables. An inner join is often called a join simply. It will delete all rows that cannot match the other table from the result table. SQL INNER JOIN inner joins the first and the second table and ON statement defines the joining condition. Below is SQL statements:

SELECT
      S.NAME,C.CONTACTS,C.ADDRESS
   FROM
      STUDENT S
   INNER JOIN
   STUDENT_CONTACTS C
   ON S.ID=C.ID

SPL solution:

SPL calls two or more tables having one-to-one relationship between them the homo-dimension tables. One is the homo-dimension table of the other, and vice versa. SPL join() function is used to perform a join operation, an inner join operation, by default.

A
1=T("Student.txt")
2=T("StudentContacts.txt")
3=join(A1:S,ID;A2:C,ID)
4=A3.new(S.NAME,C.CONTACTS,C.ADDRESS)

A1: Import Student table from the source file.
A2: Import StudentContacts table from the source file.
A3: Inner joins the two tables through their ID field.
A4: Create a new table sequence made up of fields storing student names, their contacts and addresses.

SPL supports retrieving a data table from the database. Suppose the data comes from database "db"’s "STUDENT" table, A1 in the above SPL script can be rewritten as follows:

A
1=connect("db").query("select * from STUDENT")

【Example 2】Based on EMPLOYEE table and MANAGER table, get salaries, including the allowance, of all employees, including managers. Below is part of the source data:

EMPLOYEE:

IDNAMEBIRTHDAYSTATEDEPTSALARY
1Rebecca1974/11/20CaliforniaR&D7000
2Ashley1980/07/19New YorkFinance11000
3Rachel1970/12/17New MexicoSales9000
4Emily1985/03/07TexasHR7000
5Ashley1975/05/13TexasR&D16000

MANAGER:

IDALLOWANCE
187000
211000
47000
610000
79000

SQL solution:

Managers are employees, too, but the MANAGER table stores allowance information. To query salaries of all employees, the left join is needed to get records of all employees, including managers. The left join is also called left outer join. It joins two tables based on the left table and lists all records in the left table and records in the right table that match the left table according to the specified condition in the result table. SQL LEFT JOIN is used to left join tables. Below is SQL statements:

SELECT
      E.ID,E.NAME,E.SALARY+NVL(M.ALLOWANCE,0) INCOME
   FROM EMPLOYEE E
   LEFT JOIN
   MANAGER M
   ON E.ID=M.ID

SPL solution:

SPL join() function is used to perform the join operation. @1 option enables a left join.

A
1=T("Employee.csv")
2=T("Manager.txt")
3=join@1(A1:E, ID; A2:M, ID)
4=A3.new(E.ID, E.NAME, E.SALARY+M.ALLOWANCE:INCOME)

A1: Import Employee table from the source file.
A2: Import Manager table from the source file.
A3: Left join the two tables through ID field based on the first table, the employee table.
A4: Create a new table sequence made up of fields storing employee IDs, names, and salaries.

The one-to-one relationship is the simplest among all relationships. Two tables are joined directly by matching their primary keys. Both SQL and SPL can handle this type of associative relationship effectively.

Ⅱ. One-to-many relationship

The one-to-many relationship exists between two tables where a record of one table corresponds to any one or more records in the other table. The table at the “one” end is called the primary table and the one at the “many” end is called the subtable (or subordinate table). Suppose there are orders table and order detail table, each order has the only ID but each order ID may correspond to multiple order detail records. We call the orders table the primary table and the order detail table the subtable.

【Example 3】Based on ORDERS table and ORDER_DETAIL table, calculate the total amount in each order. Below is part of the source data:

ORDERS:

IDCUSTOMERIDEMPLOYEEIDORDER_DATEARRIVAL_DATE
10248VINET52012/07/042012/08/01
10249TOMSP62012/07/052012/08/16
10250HANAR42012/07/082012/08/05
10251VICTE32012/07/082012/08/05
10252SUPRD42012/07/092012/08/06

ORDER_DETAIL:

IDORDER_NUMBERPRODUCTIDPRICECOUNTDISCOUNT
10814148102.080.15
10814248102.080.15
10814348306.0240.15
10814448102.080.15
10814548204.0160.15

SQL solution:

SQL uses JOIN statement to handle the one-to-many relationship. Below is SQL statements:

SELECT
      ID, SUM(PRICE*COUNT) AMOUNT
   FROM (
      SELECT
         Orders.ID, Detail.PRICE, Detail.COUNT
      FROM ORDERS Orders
      INNER JOIN
      ORDER_DETAIL Detail
      ON Orders.ID=Detail.ID
   )
   GROUP BY ID
   ORDER BY ID

SPL solution:

SPL uses join() function to join the primary table and the subtable through the subtable’s one key field and the primary table’s key when there is only one subtable.

A
1=T("Orders.txt")
2=T("OrderDetail.txt")
3=join(A1:Orders,ID; A2:Detail,ID)
4=A3.groups(Orders.ID; sum(Detail.PRICE*Detail.COUNT):AMOUNT)

A1: Import Orders table.
A2: Import OrderDetail table.
A3: Join the two tables through their ID fields.
A4: Group A3’s joining result table and calculate the total amount for each order.

【Example 4】Suppose the ORDERS table has another subtable for recording the payment information. We want to find the orders for which not all payment has been received, that is, those where the accumulative payment is less than the total order amount. Below is part of the source data:

ORDERS:

IDCUSTOMERIDEMPLOYEEIDORDER_DATEARRIVAL_DATE
10248VINET52012/07/042012/08/01
10249TOMSP62012/07/052012/08/16
10250HANAR42012/07/082012/08/05
10251VICTE32012/07/082012/08/05
10252SUPRD42012/07/092012/08/06

ORDER_DETAIL:

IDORDER_NUMBERPRODUCTIDPRICECOUNTDISCOUNT
10814148102.080.15
10814248102.080.15
10814348306.0240.15
10814448102.080.15
10814548204.0160.15

ORDER_PAYMENT:

IDPAY_DATEAMOUNTCHANNELINSTALMENTS
108142014/01/05816.030
108482014/01/23800.2521
108482014/01/23800.2500
108482014/01/23800.2531
109662014/03/20572.021

SQL solution:

Just JOINing the three tables is not the right way of doing this because there could be many-to-many relationship between ORDER_DETAIL table and ORDER_PAYMENT table. When the subtables are grouped by order ID, the field will become their unique, actual primary keys. Then the three tables can be joined through the ID field.

SELECT
      Orders.ID,Detail.AMOUNT,Payment.PAY_AMOUNT
   FROM ORDERS Orders
   INNER JOIN
   (
      SELECT ID, SUM(PRICE*COUNT) AMOUNT
      FROM ORDER_DETAIL
      GROUP BY ID
   ) Detail
   ON Orders.ID=Detail.ID
   INNER JOIN
   (
      SELECT ID, SUM(AMOUNT) PAY_AMOUNT
      FROM ORDER_PAYMENT
      GROUP BY ID
   ) Payment
   ON Orders.ID=Payment.ID
   WHERE PAY_AMOUNT<Detail.AMOUNT
   ORDER BY ID

SPL solution:

When the subtables are grouped by ID, the field will become the real primary key. Now we can treat them as tables with the one-to-one relationship in between (or the homo-dimension tables).

A
1=T("Orders.txt")
2=T("OrderDetail.txt")
3=T("OrderPayment.txt")
4=A2.groups(ID; sum(PRICE*COUNT):AMOUNT)
5=A3.groups(ID; sum(AMOUNT):PAY_AMOUNT)
6=join(A1:Orders,ID; A4:Detail,ID; A5:Payment,ID)
7=A6.new(Orders.ID, Detail.AMOUNT, Payment.PAY_AMOUNT)
8=A7.select(PAY_AMOUNT<AMOUNT)

A1: Import Orders table.
A2: Import OrderDetail table.
A3: Import OrderPayment table.
A4: Group OrderDetail table and calculate the total amount in each order.
A5: Group OrderPayment table and calculate the total payment amount in each order.
A6: The join() function joins Orders table and the grouped & summarized OrderDetail table and OrderPayment table through their ID fields.
A7: Create a new table sequence consisting of fields of order ID, order amount and order payment amount.
A8: Select records where the payment amount is less than the order amount, that is, those that have not received all payment.

Without the support of stepwise coding, SQL will write the whole procedure in a single statement, which is sure to be complicated. SPL displays clear logic by using the step-by-step coding mode. There are two steps for performing a join operation. The first is to group every subtable by one key field (which is the primary table’s primary key) to make them have the same actual primary key as the primary table. The second is to join the two or more tables through their primary keys (or the actual primary keys). In essence, SQL and SPL implement the join operations in the same way, but SPL’s stepwise design makes easy coding.

Leave a Reply