Handling Alignment Grouping

Handling Alignment Grouping

An alignment grouping operation compares values of a field or an expression of members of the to-be-grouped set with members of a specified base set, and puts members matching a same member in the base set to same group. The result set will have same number of groups as the number of members in the base set. The alignment grouping operation may produce empty group(s) or leave one or more unmatching members.

1. Basic alignment grouping

1.1 Keep one matching members at most per group

Group records in a table by the order of a certain field and perform summarization by each group.

【Example 1】 Based on the associated SelectCourse table and Course table, find the unselected courses according to the order in Course table.

undefined

【SPL script】

AB
1=connect("db")/Connect to database
2=A1.query("select * from SelectCourse")/ Query SelectCourse table
3=A1.query("select * from Course")/Query Course table
4=A2.align(A3:ID,CourseID)/ A.align() function groups records of SelectCourse table by aligning them to ID field of Course table, and get one matching member for each group
5=A3(A4.pos@a(null))/ Get records of unselected courses from Course table (whose corresponding values in the grouping result set are null)

A5’s result:

IDNAMETeacherID
1Environmental protection and sustainable development5
10Music appreciation18

1.2 Keep all matching members per group

Group records in a table by the order of a certain field and perform aggregate on each group.

【Example 2】 Based on the associated EMPLOYEE table and DEPARTMENT table, calculate the number of employees in each department according to the order in DEPARTMENT table.

undefined

【SPL script】

AB
1=connect("db")/ Connect to database
2=A1.query("select * from EMPLOYEE")/ Query EMPLOYEE table
3=A1.query("select * from DEPARTMENT")/ Query DEPARTMENT table
4=A2.align@a(A3:ID, DEPARTMENT)/ A.align@a() function groups records of EMPLOYEE table by aligning them to ID field of DEPARTMENT table, and get all matching members for each group; @a option enables returning all matching members for each group
5=A4.new(DEPT, ~.count():COUNT)/ Count employees in each department

A5’s result:

DEPTCOUNT
Admin4
R&D29
Sales187

1.3 Create a new group for unmatching members

Group records in a table by the order of a certain field and put unmatching records into a new group.

【Example 3】 Based on the SALARY table (Below is a part of it), calculate the average salary for the states of [California, Texas, New York, Florida] and for other states as a whole, which are classified as “Other”.

IDNAMESTATESALARY
1RebeccaCalifornia7000
2AshleyNew York11000
3RachelNew Mexico9000
4EmilyTexas7000
5AshleyTexas16000

【SPL script】

AB
1=connect("db")/ Connect to database
2=A1.query("select * from SALARY")/ Query SALARY table
3[California,Texas,New York,Florida]/ Define a sequence of states
4=A2.align@an(A3,STATE)/ align@an function groups records of SALARY table by the specified states; @a option enables returning all matching records for each group, and @n option creates a new group to hold the unmatching records
5=A4.new(if (#>A3.len(),"Other",STATE):STATE,~.avg(SALARY):AvgSalary)/ Calculate the average salary in each group and generate a new table sequence; change the last group name to “Other”, otherwise it is the state in the first record of the current group

A5’s result:

STATESALARY
California7700.0
Texas7592.59
New York7677.77
Florida7145.16
Other7308.1

2. Alignment grouping by sequence numbers

The sequence-number-based alignment grouping operation groups members in a set according to specified sequence numbers. It puts members of same sequence numbers in same group.

2.1 Keep one matching member at most per group

Find the records that are not referenced based on two associated table.

【Example 4】 Based on the associated Sales table and Customer table, list customers that have no orders in 2014.

undefined

【SPL script】

AB
1=connect("db")/ Connect to database
2=A1.query("select * from Sales")/ Query Sales table
3=A1.query("select * from Customer")/ Query Customer table
4=A3.(ID)/ Get customer IDs from Customer table
5=A2.align(A4.len(), A4.pos(CustomerID))/ A.align(n,y) function groups Sales table by aligning its records to customer IDs
6=A3(A5.pos@a(null))/ Get customer records having no orders from Customer table (where the order values are null)

A6’s result:

IDNameState
ALFKICMA-CGMTexas
CENTCNedlloydFlorida

2.2 Keep all matching records per group

Group records in a table by sequence numbers and perform aggregate on each group.

【Example 5】 Based on the following orders table (only a part of the data is shown), list the number of orders in each month of the year 2013.

IDCustomerIDOrderDateAmount
10248VINET2012/07/04428.0
10249TOMSP2012/07/051842.0
10250HANAR2012/07/081523.5
10251VICTE2012/07/08624.95
10252SUPRD2012/07/093559.5

【SPL script】

AB
1=connect("db")/ Connect to database
2=A1.query("select * from Orders where year(OrderDate)=2013")/ Get orders records of 2013
3=A2.align@a(12,month(OrderDate))/ A.align@a() function divides the orders records of 2013 into 12 groups according to the 12 months; @a option enables returning all matching records for each group
4=A3.new(#:Month,~.count():OrderCount)/ Count orders in each month

A4’s result:

MonthOrderCount
133
229
330
431
532
630
733
833
937
1038
1134
1248

2.3 Put a record in multiple groups

Get a sequence of sequence numbers and divide records by aligning records to it. A record could be put into more than one group during the process.

【Example 6】 Based on the post records table, group posts by tags and calculate the frequency of each tag. Below is part of the source table:

IDTitleAuthorLabel
1Easy analysis of Excel2Excel,ETL,Import,Export
2Early commute: Easy to pivot excel3Excel,Pivot,Python
3Initial experience of SPL1Basics,Introduction
4Talking about set and reference4Set,Reference,Dispersed,SQL
5Early commute: Better weapon than Python4Python,Contrast,Install

【SPL script】

AB
1=connect("db")/ Connect to database
2=A1.query("select * from PostRecord")/ Query PostRecord table
3=A2.conj(Label.split(",")).id()/ Split each Label value by comma and concatenate all labels into one sequence and get all unique labels
4=A2.align@ar(A3.len(),A3.pos(Label.split(",")))/ align function works with @r option to put each post record into a corresponding group according to the sequence number of each of its label in the label list
5=A4.new(A3(#):Label,~.count():Count).sort@z(Count)/ Count the posts under each label and sort the result set in descending order

A5’s result:

LabelCount
SPL7
SQL6
Basics5

2.4 Group records by ranges

Divide records in a table into multiple segments according to ranges of values in a specified field, and perform aggregate on each group.

【Example 7】 Based on the salary table, group records according to salary ranges <8000, ≤8000 &≥12000, >12000, and calculate the number of employees in each group. Below is part of the source table:

IDNAMEBIRTHDAYSALARY
1Rebecca1974-11-207000
2Ashley1980-07-1911000
3Rachel1970-12-179000
4Emily1985-03-077000
5Ashley1975-05-1316000

【SPL script】

AB
1=connect("db")/ Connect to database
2=A1.query("select * from EMPLOYEE")/ Query EMPLOYEE table
3[0,8000,12000]/ Define salary ranges
4=A2.align@a(A3.len(),A3.pseg(SALARY))/ A.pseg(x) function gets the range for salary of each record
5=A4.new(A3 (#):SALARY,~.count():COUNT)/ Count the employees in each group

A5’s result:

SALARYCOUNT
0308
8000153
1200039

The following task requires grouping records by specified ranges according to the expression result, and calculates average.

【Example 8】Based on the employee table, group records by ranges of hire durations, which are <10 years, ≥10 years &≤ 20 years, and ≥20 years, and calculate average salary in each group. Below is part of the source table:

IDNAMEBIRTHDAYSALARY
1Rebecca1974-11-207000
2Ashley1980-07-1911000
3Rachel1970-12-179000
4Emily1985-03-077000
5Ashley1975-05-1316000

【SPL script】

AB
1=connect("db")/ Connect to database
2=A1.query("select * from EMPLOYEE")/ Query EMPLOYEE table
3[0,10,20]/ Define hire duration ranges
4=now()/ Get the current date and time
5=A2.align@a(A3.len(),A3.pseg(elapse@y(A4,-~), HIREDATE))/ A.pseg(x,y) function get the range where the hire date in each record falls
6=A5.new(A3(#):EntryYears,~.avg(SALARY):AvgSalary)/ Calculate the average salary

A6’s result:

EntryYearsAvgSalary
06777.78
107445.53
206928.57

3. Enumeration grouping

Enumeration grouping defines a set of enumerated conditions, calculates the conditions using members of the to-be-grouped set as parameters, and puts members making same condition true into one subset. There is a one-to-one relationship between the subsets in the result set and the enumerated conditions. 

3.1 Put each member to the first matching group

Group records in a table according to the enumerated conditional expressions and put each record only in the first matching group.

【Example 9】 Based on the table recording population information in China’s major cities, group the cities by population. Below is a part of the source table:

IDCityPopulationProvince
1Shanghai12286274Shanghai
2Beijing9931140Beijing
3Chongqing7421420Chongqing
4Guangzhou7240465Guangdong
5Hong Kong7010000Hong Kong Special Administrative Region

【SPL script】

AB
1=connect("db")/ Connect to database
2=A1.query("select * from UrbanPopulation")/ Query UrbanPopulation table
3[?>2000000,?>1000000,?>500000,?<=500000]/ Define populations ranges: Megacities: >2 million, Super cities: >1 million & <2 million, Large cities: >0.5 million & <1 million, and Other cities
4=A2.enum(A3,Population)/ A.enum()function groups records in UrbanPopulation table according to the enumerated conditions defined in A3

A4’s result:

undefined

3.2 Create a new group for unmatching members

Group records in a table according to the specified enumerated conditional expressions, and put unmatching records in a new group.

【Example 10】 Based on the employee table, group records by age groups: < 35 years and < 45 years (put unmatching ones to a new group), and calculate average salary in each group. Below is part of the source table:

IDNAMEBIRTHDAYSALARY
1Rebecca1974-11-207000
2Ashley1980-07-1911000
3Rachel1970-12-179000
4Emily1985-03-077000
5Ashley1975-05-1316000

【SPL script】

AB
1=connect("db")/ Connect to database
2=A1.query("select * from EMPLOYEE")/ Query EMPLOYEE table
3[?<35,?<45]/ Define two age groups: < 35 years and < 45 years
4=A2.enum@n(A3, age(BIRTHDAY))/ A.enum@n() function groups employee records by the enumerated age groups; @n option enables creating a new group for unmatching records
5=A4.new(if (#>A3.len(), "Other",A3(#)):AGE,~.avg(SALARY):AvgSalary)/ Set name of the last group as Other and calculate average salary in each group

A5’s result:

AGEAvgSalary
?<357118.18
?<457448.16
Other7395.06

3.3 Put a record in multiple groups

Group records in a table according to enumerated sequences and perform calculations on each group. A record could be put into more than one group during the process.

【Example 11】 Based on the GDP table, calculate the GDP per capita for direct-controlled municipalities, first-tier cities and second-tier cities respectively. Below is part of the source table:

IDCityGDPPopulation
1Shanghai326792418
2Beijing303202171
3Shenzhen246911253
4Guangzhou230001450
5Chongqing203633372

【SPL script】

AB
1=connect("db")/ Connect to database
2=A1.query("select * from GDP")/ Query GDP table
3[["Beijing","Shanghai","Tianjin","Chongqing"].pos(?)>0,
["Beijing","Shanghai","Guangzhou","Shenzhen"].pos(?)>0,
["Chengdu","Hangzhou","Chongqing","Wuhan","Xi’an","Suzhou",
"Tianjin","Nanjing","Changsha","Zhengzhou","Dongguan","Qingdao",
"Shenyang","Ningbo","Kunming"].pos(?)>0]
/ Enumerate direct-controlled cities, first-tier cities and second-tier cities respectively
4=A2.enum@r(A3,City)/ A.enum@r() function records in GDP table according to the enumerated sequences of cities; @r option allows putting a record to more than one groups
5=A4.new(A3(#):Area,~.sum(GDP)/~.sum(Population)*10000:CapitaGDP)/ Calcualte GDP per capita in each group

A5’s result:

AreaCapitaGDP
["Beijing","Shanghai","Tianjin","Chongqing"].pos(?)>0107345.03
["Beijing","Shanghai","Guangzhou","Shenzhen"].pos(?)>0151796.49
["Chengdu","Hangzhou","Chongqing","Wuhan","Xi’an","Suzhou","Tianjin","Nanjing",
"Changsha","Zhengzhou","Dongguan","Qingdao","Shenyang","Ningbo","Kunming"].pos(?)>0
106040.57

Find more examples in SPL CookBook.

Leave a Reply