# “Top N” Queries on Structured Data

## 1. The maximum/minimum

Getting the maximum or minimum value can be regarded as a special top N scenario where N is 1. It’s so common that I single it out for special illustration. There are many such computing tasks, such as getting the highest math score in class one, finding the age of the youngest employee, and so on. On certain occasions, instead of the specific values we are more concerned with their positions. Often this happens in inter-row calculations. One example is to find how much the sales amount in the company’s best month goes up compared with the previous month. To get this done we need to first get the sequence number of the record holding the month with the highest sales amount, then the sales amount in the previous month, and compare them. Other times we want to get certain information in the record containing the maximum or minimum value. Examples include getting the name of the student who has the highest math score in class one, finding which department the youngest employee belongs to in a company, etc.

Now we’ll look at how to deal with the three scenarios of “getting maximum/minimum”. The following table records information of NASDAQ Composite:

### 1.1 Get the maximum/minimum value

【Example 1】 Get the highest closing price in the NASDAQ in the year of 2019.

【SPL script】

To get the lowest closing price in the NASDAQ in the year of 2019:

### 1.2 Get the sequence number(s) of record(s) holding the maximum/minimum value

【Example 2】 Calculate the growth rate of the trading date with the highest NASDAQ closing price in 2019 compared with the closing price in the previous day.

【SPL script】

There could be more than one record having the maximum value. To return the sequence numbers of all eligible records, you can use @a option in A.pmax() function:

You can use @z option in A.pmax()function to locate the record(s) from backwards to forwards:

### 1.3 Get the record(s) holding the maximum/minimum value

【Example 3】 Get the date in the record holding the highest NASDAQ closing price in the year of 2019.

【SPL script】

A.minp()function is used to get the record(s) containing the minimum value:

Both A.maxp()and A.minp() functions can work with @a option and @z option to achieve specific computing goals. Here we won’t give examples.

## 2. The top/bottom N

There are same three scenarios of getting the top/bottom N. Use the same NASDAQ composite data to illustrate them:

### 2.1 Get the top/bottom N values

【Example 4】 Get the top 3 NASDAQ volumes in the year of 2019.

【SPL script】

To get the bottom 4 NASDAQ volumes in the year of 2019:

### 2.2 Get the sequence numbers of the top/bottom N values

【Example 5】 Calculate the growth rate of each of the trading dates having the 3 highest NASDAQ closing prices in 2019 compared with their closing price in the prior day.

【SPL script】

### 2.3 Get the records holding the top/bottom N values

【Example 6】 Get the records of trading dates with 5 lowest volumes in NASDAQ 2019.

【SPL script】

## 3. Top N queries on post-grouping subsets

Getting the maximum/minimum value and the top/bottom N values from each subset after a data set is grouped are also common computing goals, like finding the 5 top-selling items for each month and getting the customers whose total order amounts rank in top 3 for each year. Now let’s look at how to handle top N queries on subsets.

### 3.1 Get the maximum value in each group

【Example 7】 Based on the score table, get the highest math score in each class. Below is part of the source table:

【SPL script】

### 3.2 Get top N values/records in each group

A top N query can be regarded as a kind of aggregate operation on each subset after a data set is grouped. Here we look at how to get top N values and records respectively.

【Example 8】 Query the highest two math scores in each class. Below is part of the score table:

【SPL script】

【Example 9】 For each subject in each class, query the information of students whose scores rank in top 3. Below is part of the score table:

【SPL script】

### 3.3 Get top N values/records cumulatively

etting top N values or records in a cumulative way won’t generate the subsets after the data set is grouped. This method is used to handle scenarios when data volume is huge. There are same two scenarios here – top N values and top N records.

【Example 10】 Get the hire dates of the two employees in each department who have the longest hire durations. Below is part of the employee table:

【SPL script】

【Example 11】 Get the information of employees whose salaries rank in top 3 in each department. Below is part of the employee table:

【SPL script】

Find more examples in SPL CookBook.