The Select Operation on Structured Data

The Select Operation on Structured Data

1. Get members meeting the specific condition

【Example 1】 List European and African cities with a population of more than 2 million and their populations in column groups (each column group is ordered by population in descending order). Below is part of the world’s urban population table:

ContinentCountryCityPopulation
AfricaEgyptCairo6789479
AsiaChinaShanghai24240000
EuropeBritainLondon7285000

The expected result:

Europe CityPopulationAfrica CityPopulation
Moscow8389200Cairo6789479
London7285000Kinshasa5064000
St Petersburg4694000Alexandria3328196

【SPL script】

AB
1=connect("db").query("select * from World where Continent in('Europe','Africa') and Population >= 2000000")/ Connect to database and get records of European and African cities having a population of over 2 million
2=A1.select(Continent:"Europe")/ A.select()function gets records of Europe
3=A1.select(Continent:"Africa")/A.select()function gets records of Africa
4=create('Europe City',Population,'Africa City', Population)/ Create an empty table according to the target structure
5=A4.paste(A2.(City),A2.(Population),A3.(City),A3.(Population))/ A.paste()function pastes values to corresponding columns

A4’s result:

Europe CityPopulationAfrica CityPopulation
Moscow8389200Cairo6789479
London7285000Kinshasa5064000
St Petersburg4694000Alexandria3328196

【Example 2】 Get full names and salaries of employees in R&D department in the New York state. Below is part of the source data:

IDNAMESURNAMESTATEDEPTSALARY
1RebeccaMooreCaliforniaR&D7000
2AshleyWilsonNew YorkFinance11000
3RachelJohnsonNew MexicoSales9000
4EmilySmithTexasHR7000
5AshleySmithTexasR&D16000

【SPL script】

AB
1=connect("db")/ Connect to the data source
2=A1.query("select * from Employee")/ Import Employee table
3=A2.select(STATE=="New York"&&DEPT=="R&D")/ A.select()function gets records of employees of R&D department in New York; the double ampersand sign ("&&") is used to connect two conditions that need to be met at the same time
4=A3.new(NAME+""+SURNAME:FULLNAME, SALARY)/ Use the plus sign (+) to connect the name string and surname string into a full name

A4’s result:

FULLNAMESALARY
Matthew Johnson6000
Lauren Thomas12000
Brooke Williams12000

【Example 3】 Find the number of employees who are below 30 or over 50 in each department. Below is part of the source data:

IDNAMEBIRTHDAYSTATEDEPTSALARY
1Rebecca1974/11/20CaliforniaR&D7000
2Ashley1980/07/19New YorkFinance11000
3Rachel1970/12/17New MexicoSales9000
4Emily1985/03/07TexasHR7000
5Ashley1975/05/13TexasR&D16000

【SPL script】

AB
1=connect("db")/ Connect to the data source
2=A1.query("select * from Employee")/ Import Employee table
3=A2.select((age=age(BIRTHDAY), age<30 || age>50))/ A.select()function gets records of employees below 30 or over 50; use the double vertical bar ("||") to connect two conditions that are met if either of them are met
4=A3.groups(DEPT; count(~):Count)/ Group the selected records and count the employees

A4’s result:

DEPTCount
49
55
63
76
81

2. Get the record(s) holding the maximum/minimum value

【Example 4】 Based on the scores table, get the student ID with the lowest math score. Below is part of the source table:

CLASSSTUDENTIDSUBJECTSCORE
Class one1English84
Class one1Math77
Class one1PE69
Class one2English81
Class one2Math80

【SPL script】

AB
1=connect("db")/ Connect to the database
2=A1.query("select * from Scores where SUBJECT='Math'and CLASS='Class one'")/ Get math records for class one
3=A2.minp(SCORE)/A.minp() function gets the record having the lowest score
4=A3.STUDENTID/ Get the student ID from A3’s record

There could be more than one record that contains the minimum value. To return all eligible records, you can use @a option in A.minp() function:

AB
3=A2.minp@a(SCORE)/ A.minp() function works with @a option to get all records containing the lowest score
4=A3.(STUDENTID)/ Get student IDs from A3’s records

A3’s result:

CLASSSTUDENTIDSUBJECTSCORE
Class one5Math60
Class one14Math60

A4’s result:

Member
5
14

【Example 5】 Based on the Olympics medal table, get all information of the country that holds the top place for the longest time in terms of total medals. Below is part of the source table:

GameNationGoldSilverCopper
30USA462929
30China382723
30UK291719
30Russia242632
30Korea1387

【SPL script】

AB
1=file("Olympic.csv").import@cqt()/ Import the Olympic medal tally file
2=A1.sort@z(Game, 1000000*Gold+1000*Silver+Copper)/ Sort records by games and the totals
3=A2.group@o1(Game)/ Get the one record for each game, which is the first record because records are already sorted in descending order
4=A3.group@o(Nation)/ Group the selected records by nations in their original order
5=A4.maxp(~.len())/ A.maxp() selects the group having the most records, which are the country that holds the top place in a row for the longest time

A5’s result:

GameNationGoldSilverCopper
10USA413230
9USA221816
8USA452727
7USA412728

3. Get data by ranges

Sometimes we need to first find the sequence number of the range where a value belongs to and then get its corresponding members in a set. Examples include returning the assessment (Excellent, Good, Average, Failed) according to a student’s score, and finding a household’s income range (Below or near poverty level, Low income, middle class, Upper middle class) according to its annually income.

【Example 6】 Based on the score table, find the number of students in Excellent, Pass, and Failed ranges respectively for the English subject. Below is part of the source data:

CLASSSTUDENTIDSUBJECTSCORE
Class one1English84
Class one1Math77
Class one1PE69
Class one2English81
Class one2Math80

【SPL script】

AB
1=connect("db").query("select * from Scores where SUBJECT='English'")/ Connect to the database and get records of English subject
2=create(Assessment,Score).record(["fail",0,"pass",60,"excellent",90])/ Create a lookup table of score ranges and assessments
3=A1.derive(A2.segp(Score,SCORE).Assessment:Assessment)/ A.segp() function gets assessment for each score according to the sequence number of the range where the score falls in
4=A3.groups(Assessment;count(1):Count)/ Group A3’s records by assessment and count students in each group

A4’s result:

AssessmentCount
excellent6
Fail4
Pass18

4. Get the top/bottom N values

【Example 7】 Based on the score table, for each subject in every class, find the top 2 students’ IDs in terms of score. Below is part of the source data:

CLASSSTUDENTIDSUBJECTSCORE
Class one1English84
Class one1Math77
Class one1PE69
Class one2English81
Class one2Math80

【SPL script】

AB
1=connect("db")/ Connect to the database
2=A1.query("select * from Scores")/ Query Scores table
3=A2.group(CLASS,SUBJECT;~.top(-2;SCORE):TOP2)/ A.top() function gets the records of top 2 students for each subject in each class; -2 means getting two records in descending order
4=A3.conj(TOP2)/ Concatenate all top 2 records together

A4’s result:

CLASSSTUDENTIDSUBJECTSCORE
Class one4English96
Class one9English93
Class one13Math97
Class one10Math97

5. Query records by primary key values

It’s common to locate records according to primary key values, such as finding employee records by employee IDs, querying detailed orders data by orders IDs, etc.

【Example 8】 Based on the associated Course table and SelectCourse table, list a table of course selection information where each course occupies a column. Below is part of the Course table:

IDNAMETEACHERID
1Environmental protection and …5
2Mental health of College Students1
3Computer language Matlab8

Here is part of the SelectCourse table:

IDSTUDENT_NAMECOURSE
1Rebecca2,7
2Ashley1,8
3Rachel2,7,10

The expected result:

IDSTUDENT_NAMECOURSE1COURSE2COURSE3
1RebeccaMental health of College StudentsInto Shakespeare
2AshleyEnvironmental protection and …Modern economics
3RachelMental health of College StudentsInto ShakespeareMusic appreciation

【SPL script】

AB
1=connect("db")/Connect to the database
2=A1.query("select * from Course").keys(ID)/ Query Course table and set ID as the primary key
3=A1.query("select * from SelectCourse")/ Query SelectCourse table
4=A3.run(COURSE=COURSE.split@cp())/ Split each Course value in SelectCourse table by comma and reassign it to the field
5=A4.max(COURSE.len())/ Get the largest number of selected courses
6=create(ID,STUDENT_NAME, ${A5.("COURSE"+string(~)).concat@c()})/ Create an empty table where the number of course columns is the number A5 gets
7>A4.run(A6.record([ID,STUDENT_NAME]|COURSE.(A2.find(~).Name)))/ Loop through the SelectCourse table to join and append student IDs, names and the courses obtained through A.find() function to A6’s table

A6’s result:

IDSTUDENT_NAMECOURSE1COURSE2COURSE3
1RebeccaMental health of College StudentsInto Shakespeare
2AshleyEnvironmental protection and …Modern economics
3RachelMental health of College StudentsInto ShakespeareMusic appreciation

Find more examples in SPL CookBook.

Leave a Reply