Comparison of SQL & SPL: Recursion Operation

Ⅰ. Search for all references recursively
【Example 1】According to a company’s organizational structure below, get the hierarchical level of each department (headquarters are level 1, branches are level 2, and so on).
ID | ORG_NAME | PARENT_ID |
1 | Head Office | 0 |
2 | Beijing Branch Office | 1 |
3 | Shanghai Branch Office | 1 |
4 | Chengdu Branch Office | 1 |
5 | Beijing R&D Center | 2 |
… | … | … |
SQL solution:
To get this done, we need to loop through each record to find all superiors for each organization recursively. According to the natural way of thinking, the expected recursive process is that, for each record of each organization, we can locate the parent record through its superior’s ID and then the grandparent record through the corresponding ID, and so on. SQL does not have the concepts of explicit record and reference, so we can only get records having the superior-subordinate relationship recursively and sort them in a specific order, as shown below:

SQL offers WITH statement to do the recursion query, as shown below:
WITH CTE1 (ID,ORG_NAME,PARENT_ID,O_ID,O_ORG_NAME) AS( SELECT ID,ORG_NAME,PARENT_ID,ID O_ID,ORG_NAME O_ORG_NAME FROM ORGANIZATION UNION ALL SELECT ORG.ID,ORG.ORG_NAME,ORG.PARENT_ID,CTE1.O_ID,CTE1.O_ORG_NAME FROM ORGANIZATION ORG INNER JOIN CTE1 ON ORG.ID=CTE1.PARENT_ID ) SELECT O_ID ID,MIN(O_ORG_NAME) ORG_NAME,COUNT(*) "LEVEL" FROM ( SELECT ID,ORG_NAME,PARENT_ID,O_ID,O_ORG_NAME FROM( SELECT * FROM CTE1 ORDER BY O_ID,ID ) ) GROUP BY O_ID ORDER BY ID
With Oracle, you can also use START WITH … CONNECT BY … PRIOR … to do the recursive query, as shown below:
SELECT MAX(ID) ID, MAX(ORG_NAME) ORG_NAME, COUNT(*) "LEVEL" FROM ( SELECT ID,ORG_NAME,SUM(ID) OVER (ORDER BY RN) GROUP_ID FROM ( SELECT CASE WHEN LAG(PARENT_ID,1) OVER(ORDER BY RN ASC)=0 OR LAG(PARENT_ID,1) OVER(ORDER BY RN ASC) IS NULL THEN ID ELSE NULL END ID, CASE WHEN LAG(PARENT_ID,1) OVER(ORDER BY RN ASC)=0 OR LAG(PARENT_ID,1) OVER(ORDER BY RN ASC) IS NULL THEN ORG_NAME ELSE NULL END ORG_NAME, RN FROM ( SELECT ID,ORG_NAME,PARENT_ID,ROWNUM RN FROM ORGANIZATION ORG START WITH ORG.PARENT_ID>=0 CONNECT BY ORG.ID=PRIOR ORG.PARENT_ID ) ) ) GROUP BY GROUP_ID ORDER BY GROUP_ID
The START WITH … statement does not generate simpler code. Though the program returns records of all superiors in turn for every organization to the result set, we need to number records in each group and compare each record with its neighboring one to get the initial position of a group because a SQL set is unordered. As SQL GROUP BY only supports the equi-grouping, we need to add group marks for a further grouping. It would be much simpler if the language supported conditional grouping (according to PARENT_ID=0, for instance) as SPL does.
With databases that neither support WITH nor START WITH …, SQL is unable to achieve recursive queries directly but it has to use the stored procedure to write an recursion function (The details will be skipped here).
SPL solution:
SPL provides A.prior(F) function to perform a recursive query to find all references by default.
A | |
1 | =T("Organization.txt") |
2 | >A1.switch(PARENT_ID,A1:ID) |
3 | =A1.new(ID,ORG_NAME,~.prior(PARENT_ID).len():LEVEL) |
A1: Import Organization table from the source file.
A2: Objectify the ID foreign key of the superior organization to convert it into the corresponding record and achieve self-join.
A3: Create a new table made up of ID, organization name and level. A.prior() function calculates the department level by getting the number of levels of records that a record references.
The SPL script of handling an recursion problem is concise because SPL’s recursive logic is natural. A.prior() function returns a set of all records of the parent organization for the current organization. Then it’s easy to perform count or any other operation, even a complicated one.
SPL supports retrieving a data table from the database, too. We can change A1 in the above script as follows:
A | |
1 | =connect("db").query("SELECT * FROM ORGANIZATION") |
Ⅱ. Search for all references recursively until a specific value appears
【Example 2】According to a company’s organizational structure below, get Beijing Branch Office’s subordinate organizations and its superiors’ names (separate multilevel organizations by comma).
ID | ORG_NAME | PARENT_ID |
1 | Head Office | 0 |
2 | Beijing Branch Office | 1 |
3 | Shanghai Branch Office | 1 |
4 | Chengdu Branch Office | 1 |
5 | Beijing R&D Center | 2 |
… | … | … |
SQL solution:
After the analysis of example 1, the first idea for doing this is that, during the recursive process for searching for the superiors for each organization, stop the recursion if the specified value (the Beijing Branch Office) is found, keep its records and filter away records of the organizations that cannot be found. It is difficult for SQL to hit the target with one round of recursion. So we divide the task into two steps. First, find all subordinate organizations of Beijing Branch Office; second, as the solution to the above problem does, find all superior organizations for each of those organizations until Beijing Branch Office appears. Below is the SQL queries:
WITH CTE1 (ID,ORG_NAME,PARENT_ID) AS( SELECT ID,ORG_NAME,PARENT_ID FROM ORGANIZATION WHERE ORG_NAME='Beijing Branch Office' UNION ALL SELECT ORG.ID,ORG.ORG_NAME,ORG.PARENT_ID FROM ORGANIZATION ORG INNER JOIN CTE1 ON ORG.PARENT_ID=CTE1.ID ) ,CTE2 (ID,ORG_NAME,PARENT_ID,O_ID,GROUP_NUM) AS( SELECT ID,ORG_NAME,PARENT_ID,ID O_ID,1 GROUP_NUM FROM CTE1 UNION ALL SELECT ORG.ID,ORG.ORG_NAME,ORG.PARENT_ID, CTE2.O_ID,CTE2.GROUP_NUM+1 GROUP_NUM FROM ORGANIZATION ORG INNER JOIN CTE2 ON ORG.ID=CTE2.PARENT_ID AND CTE2.ORG_NAME<>'Beijing Branch Office' ) SELECT MAX(O_ID) ID, MAX(O_ORG_NAME) ORG_NAME, MAX(PARENT_NAME) PARENT_NAME FROM( SELECT O_ID, O_ORG_NAME, WM_CONCAT(ORG_NAME) OVER (PARTITION BY O_ID ORDER BY O_ID,GROUP_NUM) PARENT_NAME FROM( SELECT ID,PARENT_ID,O_ID,GROUP_NUM, CASE WHEN GROUP_NUM=1 THEN NULL ELSE ORG_NAME END ORG_NAME, CASE WHEN GROUP_NUM=1 THEN ORG_NAME ELSE NULL END O_ORG_NAME FROM ( SELECT ID,ORG_NAME,PARENT_ID,O_ID,GROUP_NUM,ROWNUM RN FROM CTE2 ORDER BY O_ID,GROUP_NUM ) ) ) GROUP BY O_ID ORDER BY O_ID
With Oracle, you can also use START WITH … CONNECT BY … PRIOR … to do the recursive query, as shown below:
WITH CTE1 AS ( SELECT ID,ORG_NAME,PARENT_ID,ROWNUM RN FROM ORGANIZATION ORG START WITH ORG.ORG_NAME='Beijing Branch Office' CONNECT BY ORG.PARENT_ID=PRIOR ORG.ID ) ,CTE2 AS ( SELECT ID,ORG_NAME,PARENT_ID,RN, CASE WHEN LAG(ORG_NAME,1) OVER(ORDER BY RN ASC)= 'Beijing Branch Office' OR LAG(ORG_NAME,1) OVER(ORDER BY RN ASC) IS NULL THEN 1 ELSE 0 END FLAG FROM( SELECT ID,ORG_NAME,PARENT_ID,ROWNUM RN FROM CTE1 START WITH 1=1 CONNECT BY CTE1.ID=PRIOR CTE1.PARENT_ID ) ) SELECT MAX(ID) ID, MAX(O_ORG_NAME) ORG_NAME, MAX(PARENT_NAME) PARENT_NAME FROM( SELECT ID,O_ORG_NAME,GROUP_ID, WM_CONCAT(ORG_NAME) OVER (PARTITION BY GROUP_ID ORDER BY RN) PARENT_NAME FROM( SELECT ID, ORG_NAME, O_ORG_NAME,RN, SUM(ID) OVER (ORDER BY RN) GROUP_ID FROM( SELECT PARENT_ID,RN, CASE WHEN FLAG=1 THEN NULL ELSE ORG_NAME END ORG_NAME, CASE WHEN FLAG=1 THEN ID ELSE NULL END ID, CASE WHEN FLAG=1 THEN ORG_NAME ELSE NULL END O_ORG_NAME FROM CTE2 ) ) ) GROUP BY GROUP_ID ORDER BY GROUP_ID
SPL solution:
SPL offers A.prior(F,r) function to do the recursive query to find the refences until a specific value appears:
A | |
1 | =T("Organization.txt") |
2 | >A1.switch(PARENT_ID,A1:ID) |
3 | =A1.select@1(ORG_NAME=="Beijing Branch Office") |
4 | =A1.new(ID,ORG_NAME,~.prior(PARENT_ID,A3) :PARENT_NAME) |
5 | =A4.select(PARENT_NAME!=null) |
6 | =A5.run(PARENT_NAME=PARENT_NAME.(PARENT_ID.ORG_NAME).concat@c()) |
A1: Import the Organization table.
A2: Objectify the ID foreign key of the superior organization to convert it into the corresponding record and achieve foreign key objectification.
A3: Get records of Beijing Branch Office.
A4: Create a new table made up of ID, organization name and the set of records of all superior organizations.
A5: Get records that has at least one superior organization, that is, those of Beijing Branch Office.
A6: Join up names of superior organizations into a comma-separated string in a circular way.
The SPL solution is logically clear and concise. It uses records of Beijing Branch Office as parameters to do the recursive search of references.
Ⅲ. Search for leaf records recursively
【Example 3】According to the following Chinese administrative divisions table, find regions and counties under Hebei province. Below is part of the source table:
ID | NAME | PARENT_ID |
1 | China | 0 |
11 | Beijing | 1 |
12 | Tianjin | 1 |
13 | Hebei | 1 |
… | … | … |
1301 | Shijiazhuang | 13 |
1302 | Tangshan | 13 |
… | … | … |
SQL solution:
First, we find all records of Hebei province and then find records that are not superior regions of other regions. Below are the SQL queries:
WITH CTE1 (ID,NAME,PARENT_ID,PARENT_NAME) AS( SELECT ID,NAME,PARENT_ID,NULL PARENT_NAME FROM CHINA_REGION WHERE NAME='Hebei' UNION ALL SELECT T.ID,T.NAME,T.PARENT_ID,CTE1.NAME PARENT_NAME FROM CHINA_REGION T INNER JOIN CTE1 ON T.PARENT_ID=CTE1.ID ) SELECT ID,NAME,PARENT_NAME FROM CTE1 T1 WHERE NOT EXISTS( SELECT 1 FROM CTE1 T2 WHERE T1.ID=T2.PARENT_ID ) ORDER BY ID
SPL solution:
SPL has P.nodes@d(F,r) function to find all leaves recursively:
A | |
1 | =T("ChinaRegion.csv") |
2 | >A1.switch(PARENT_ID,A1:ID) |
3 | =A1.select@1(NAME=="Hebei") |
4 | =A1.nodes@d(PARENT_ID,A3) |
5 | =A4.new(ID,NAME,PARENT_ID.NAME:PARENT_NAME) |
A1: Import ChinaRegion table.
A2: Objectify the ID foreign key of the superior region to convert it into the corresponding record and achieve foreign key objectification.
A3: Get records of Hebei province.
A4: Use P.nodes() function to perform the recursive search; @d option is used to recursively find all leaves.
A5: Create a new table made up of ID, region names and names of superior regions.
IV. Traverse all files under a directory
【Example 4】According to the following terminal device use for online teaching in a primary school, find the proportion of each type of terminal. Below is the directory structure:

Files under the directory is of Excel format. Below is part of the source data:
ID | STUDENT_NAME | TERMINAL |
1 | Rebecca Moore | Phone |
2 | Ashley Wilson | Phone,PC,Pad |
3 | Rachel Johnson | Phone,PC,Pad |
4 | Emily Smith | PC,Pad |
5 | Ashley Smith | PC |
6 | Matthew Johnson | Phone |
7 | Alexis Smith | Phone,PC |
8 | Megan Wilson | Phone,PC,Pad |
… | … | … |
Since SQL cannot retrieve a directory and the files under it, a SQL solution is thus impossible. Let’s look at how SPL handles the recursive traversal of directories and files.
SPL solution:
SPL offers directory@s() function to search for names of all files under all subdirectories in an recursive way:
A | |
1 | =directory@ps("C:/Primary School") |
2 | >A1.run(t=T(~),@+=t.len(),~=t.conj(TERMINAL.split@c())) |
3 | =A1.conj().groups(~:TERMINAL;count(~)/A2:PERCENTAGE) |
A1:Use directory() function to import the file directory. @s option is used to get file names recursively under all subdirectories.
A2:Read in files circularly, split each string of terminals by comma, and concatenate all unique terminals.
A3:Group records by terminal and calculate the proportion for each.
V. Concatenate field values in a JSON file recursively
【Example 5】Below is the JSON data of worldwide new COVID-19 confirmed cases in a certain time point. The task is to get the total confirmed cases across the world. Below is part of the source data:
[ {Region:"USA",Confirmed:[ {Region:"California",Confirmed:3671902}, {Region:"New York",Confirmed:1139740}, {Region:"Virginia",Confirmed:620801}, {Region:"Florida",Confirmed:2064525}, … ]}, {Region:"Brazil",Confirmed:[…]}, {Region:"India",Confirmed: […]}, {Region:"France",Confirmed: […]} … ]
As SQL cannot retrieve a JSON file, we will skip the SQL solution. Below is the SPL way of handling the recursive traversal of a JSON file.
SPL solution:
SPL offers A.field@r()function to get field values recursively, and A.conj@r() function to concatenate members of a sequence recursively. Below is the SPL script:
A | |
1 | =json(file("COVID-19.json").read()) |
2 | =A1.field@r("Confirmed") |
3 | =A2.conj@r().sum() |
A1: Read in the JSON file.
A2: GET ALL Confirmed field values recursively.
A3: Concatenate all numbers of confirmed cases and sum them.
Summary
SQL has its method of achieving recursion problems, but the method not convenient to use. This is because SQL does not have the concepts of explicit records and reference, and row numbers should be used in the result of an recursive operation to mark the superior-subordinate relationship. Besides, SQL is based on unordered sets, which makes it troublesome to perform subsequent operations after recursions.
SPL supports the data structure of different data types and offers the concept of explicit records. More importantly, SPL’s logic in handle recursion problems is clear. It uses an recursion function after the self-join and then a set operation to finish the job. SPL supplies flexible and adaptable recursion features to handle common recursion scenarios, such as directory recursion and file content recursion.
With a complex query, the complexity of SQL solution increases sharply by resorting to temporary table and nested queries. This makes it hard to write and maintain the SQL statements. SPL, however, organizes an algorithm in a natural way of thinking and generates concise code.
The SPL-driven and ordered-set-based esProc is the professional data computation engine that provides a complete set of set-oriented operations. The tool combines merits of both Java and SQL and makes it easy to handle recursion scenarios.