Membership Test for Structured Data

Membership Test for Structured Data

1. Set membership test

Get records from a table according to whether a value is included in a specified set (where the number of members is within 10).

【Example 1】 Calculate the average salary in each department in first-tier cities based on the employee table. Below is part of the source data:

IDNAMECITYSALARY
1RebeccaTianjin7000
2AshleyTianjin11000
3RachelShijiazhuang9000
4EmilyShenzhen7000
5AshleyNanjing16000

【Analysis】

When getting data from the employee table, we need to check whether the city where an employee settles is included in the constant set of [Beijing, Shanghai, Guangzhou, Shenzhen]. When the number of members in the set is within 10, we can use A.contain() function to get the eligible records.

【SPL script】

AB
1=connect("db").query("select * from Employee")/Connect to database and query Employeetable
2[Beijing, Shanghai, Guangzhou, Shenzhen]/Define a constant set of first-tier cities
3=A1.select(A2.contain(CITY))/Use A.contain()function to get records where the CITY value is contained in A2’s constant set
4=A3.groups(DEPT; avg(SALARY):SALARY)/Group selected records by DEPT and calculate average salary in each group

A4’s result:

DEPTSALARY
Finance7833.33
HR7187.5
Marketing7977.27

Get records from a table according to whether a value is included in a specified set (that has members over 10).  

【Example 2】 Calculate the sales amount of every big customer in each month of the year 2014 based on the sales table. Below is part of the source data:

IDCustomerSellerIdDateAmount
10400EASTC12014/01/013063.0
10401HANAR12014/01/013868.6
10402ERNSH82014/01/022713.5
10403ERNSH42014/01/031005.9
10404MAGAA22014/01/031675.0

【Analysis】

Similarly, we need to check whether the Customer value in each sales record is included in the constant set of big customers. In this case the number of members in the set is relatively large (above 10). We first sort the constant set, and then use @b option with A.contain() function to perform a binary search.

【SPL script】

AB
1=connect("db").query("select * from Sales")/Connect to database and query Salestable
2=["SAVEA","QUICK","ERNSH","HUN","RATTC","HANAR","FOLKO","QUEEN,MEREP","WHITC","FRANK","KOENE"].sort()/Define a constant set of big customers and sort it
3=A1.select(year(Date)==2014 && A2.contain@b(Customer))/Get records of big customers in 2014. As set A is alreadyordered, use @b option with A.contain() function to perform a binary search
4=A3.groups(month(Date):Month; sum(Amount):Amount)/Group the selected records by month and sum the sales amounts in each month

A4’s result:

STATESALARY
California7700.0
Texas7592.59
New York7677.77
Florida7145.16
Other7308.1

2. Foreign key mapping membership test

Get records from a table according to whether a record is included in the foreign key references in its associated table.

【Example 3】 Find how many students are there in each class who select the Matlab course. Below are SelectCourse table and Course table:

undefined

【Analysis】

We need to check whether the CourseID in a record is “Matlab” when getting records from the SelectCourse table. Before that we get a set of course records from the Course table where the course name is “Matlab”, and then get records from the SelectCourse table where the CourseID is contained in the set.

【SPL script】

AB
1=connect("db")/Connect to database
2=A1.query("select * from Course")/Query Course table
3=A1.query("select * from SelectCourse")/Query SelectCourse table
4=A2.select(Name=="Matlab")/GetCourse table records where the course name is “Matlab”
5=A3.join@i(CourseID, A4:ID)/Use @i option with A.join()function to perform a join filtering
6=A5.groups(Class; count(1):SelectCount)/Group the selected records and count the students who select the specified course in each class

A6’s result:

ClassSelectCount
Class 13
Class 25

3. Non-foreign key mapping membership test

Get records from a table according to whether a record is included in the non-foreign key references in its associated table.

【Example 4】 Find the number of students in each class who have a score of above 80 for any subject. Below are Score table and Student table:

undefined

【Analysis】

When getting records from the Student table, we need to check whether the current student has a subject whose score is above 80. First we select all Score table records where the Score value is greater than 80 and perform distinct by StudentID to generate a set of records of unique student IDs whose corresponding scores are above 80. Then we get Student table records whose IDs are included in that set of records.

【SPL script】

AB
1=connect("db")/Connect to database
2=A1.query("select * from Student")/Query Student table
3=A1.query("select * from Score")/Query Score table
4=A3.select(Score>80)/Get Score table records where score value is greater than 80
5=A4.id(StudentID)/id function performs distinct operation by student ID
6=A2.join@i(ID, A5)/Use A.join@i()functionto perform join filtering
7=A6.groups(Class; count(1):StudentCount)/Group the joining records and count the eligible students in each class

A7’s result:

ClassStudentCount
Class 19
Class 211

In an optimized and efficient way, test and get records from a table according to whether a record is included in the non-foreign key references in its associated table.

【Example 5】Get the number of customers in each city who placed an order in 2014. Below are Sales table and Customer table:

undefined

【Analysis】

When getting records from the Sales table, we need to check whether the current customer placed an order in 2014. First we select sales table records of 2014 and perform distinct by customer ID to generate a set of records of unique customer IDs who have sales records in 2014. Then we get Costomer table records whose IDs are included in that set of records.

【SPL script】

AB
1=connect("db")/Connect to database
2=A1.query("select * from Customer")/Query Customer table
3=A1.query("select * from Sales where year(Date)=2014 order by CustomerID")/Query sales records of 2014 and sort them by customer ID
4=A3.groups@o(ID)/As the selected records are already ordered, groups() function works with @o option to perform distinct by customerID
5=A2.join@i(ID, A4:CustomerID)/UseA.join@i()function to perform a join filtering
6=A5.groups(City; count(1):CustomerCount)/Group the joining records and count the eligible customers in each city

A6’s result:

CityCustomerCount
Dongying6
Tangshan7

4. Foreign key mapping non-membership test

Get records from a table according to whether a record is not included in the non-foreign key references in its associated table.

【Example 6】 Find the total sales amount of each new customer in the year of 2014. Below are Sales table and Customer table:

undefined

【Analysis】

When getting records from the Sales table, we need to make sure the current customer didn’t place an order in 2014. First we select sales table records of 2014 and then we get those whose customer IDs are not included in the Customer table.

【SPL script】

AB
1=connect("db")/Connect to database
2=A1.query("select * from Sales where year(OrderDate)=2014")/Query sales records of 2014
3=A1.query("select * from Customer")/Query Customer table
4=A2.join@d(CustomerID ,A3:ID)/Use A.join@d()function to get sales records whose customer IDs are not included in Customer table
5=A4.groups(CustomerID; sum(Amount):Amount)/Group the selected records and sum sales amounts of each customer

A5’s result:

CustomerIDAmount
DOS11830.1
HUN57317.39

Find more examples in SPL CookBook

Leave a Reply