Are You Trying to Replace SQL with Kotlin?

I believe a lot of us have the experience of facing structured data computing situations in which databases cannot be conveniently utilized. Before Java 8 is released, we resort to hardcoding to resolve such a situation. Java 8’s introduction of a lazy evaluation Stream that facilitates set-oriented operations has made the situations better handled, but the class library also has its own weaknesses (Find more about this in Are You Trying to Replace SQL with Java 8 Stream?). Kotlin is a programming language that interoperates fully with Java and supports JavaScript. It is developed based on Stream, but it is improved to try to cast off disadvantages. The most important improvement is the simplification of Lambda syntax. Other improvements include the addition of eager-evaluation-driven set-oriented operations and a series of related functions.

The advances are effective. Yet still Kotlin does not supply specialized data objects for structured data computations. Having the same root problem, Kotlin is as unable as Stream to replace SQL while showing the same weaknesses. By implementing the example tasks in the previous essay about Stream with Kotlin, we can learn about their similarities and differences more clearly.

When members of a set are of simple data types, both Stream and Kotlin are convenient to use for implementing set-oriented operations. Below is an example of integer set filtering:

var numbers=listOf(3,11,21,27,9)  
var r1=numbers.filter{it>=10 && it<20}


var r2=numbers.sorted() 


var others=listOf(2,11,21)

var result=numbers intersect  others

All the three pieces of code employ eager set List<T>, which is more convenient to use than lazy set Stream<T> for import and export, reuse, and type conversion. It is particularly suitable for handling scenarios that involve a relatively small volume of data and do not demand high performance. The set-oriented intersection function (the infix format) is a Kotlin feature that Stream does not have. Kotlin boasts a series of set-oriented operations to help generate short and easy-to-understand code. Kotlin and SQL have some functions in common, such as distinct, sum, and count, etc.

When members of a set are not of simple data types but records (usually data class), Kotlin becomes as not convenient as Stream. To sort Orders table’s Client field in the reverse Order and its Amount field in the normal Order, for instance:

//Orders is of List< Order> type; Order is defined as follows:

//data class Order(var OrderID: Int,var Client: String,var SellerId: Int, var Amount: Double, var OrderDate: Date)

var resutl=Orders.sortedBy{it.Amount}.sortedByDescending{it.Client}

Probably because it is stream-based, Kotlin also writes the sorting fields in the reverse Order and uses a different sorting function from that used with simple data types. The code becomes much more complicated.

Another example about grouping. To group Orders table by SellerId and sum Amount field, for instance:

var result =Orders.groupingBy(Order::SellerId).fold(0.0){acc,elem->(acc+elem.Amount)}
 result.forEach{println("${it.key}:\t ${it.value}")}

Kotlin provides fold function to encapsulate a code snippet consisting of Stream sum-like functions, collect + Collectors + summarizingDouble + DoubleSummaryStatistics, or others, to make a clear structure.

Yet the grouping & aggregation result remains the same, which is Map type rather than the regular records. The inconsistence between data types of the source data and the result data in Kotlin (SQL is consistent) requires type conversion in subsequent computations. Kotlin also has reduce, aggregate and other functions to achieve the effect of aggregation. Each function has its own uses and application scenarios. They usually need a lot of effort to learn and thus are not as convenient to use as the SQL counterpart.

Let’s look at an example of grouping and summarizing data by a set of fields. To group Orders table by Year and Client and sum and count Amount field, for instance:

data class Grp(var OrderYear:Int,var SellerId:Int)
 data class Agg(var sumAmount: Double,var rowCount:Int)
 var result=Orders.groupingBy{Grp(it.OrderDate.year+1900,it.SellerId)}
  acc, elem ->  Agg(acc.sumAmount + elem.Amount,acc.rowCount+1)

.toSortedMap(compareBy<Grp> {  it. OrderYear}.thenBy {it. SellerId})
 result.forEach{println("group fields:${it.key.OrderYear}\t${it.key.SellerId}\t  aggregate fields:${it.value.sumAmount}\t${it.value.rowCount}") }

or a grouping & aggregation operation, we can store the single grouping field or a value in a key in Kotlin Map, but we cannot store a set of grouping fields because a key can only contain one field. To solve the issue, we can define a structured data object Grp, piece together the multiple grouping fields into it, and store the data object in a key. As Kotlin is Stream-based, it does not support the dynamic data structure, too. We need to define the result’s structure beforehand. It is hard for SQL programmers to adapt to such a rigid rule, as SQL is an interpreted language that does not need to define a data structure in advance because dynamic data structures are one of its basic features.

The above solution can be conveniently or complicated implemented. In the essay on Stream, we piece multiple fields together into one with the separator to convert the multi-field grouping & aggregation into a single-field one, and finally split the result strings by separator.

A sorting operation (with the aim of being consistent with SQL’s computing result but not mandatory) often follows after the grouping & aggregation operation. But the sorting function used on Map is different from that used on records. There are many more differences between Kotlin and SQL. This makes Kotlin extremely hard to learn.

Let’s move on to take a look at join operations. To perform an inner join between Orders table and Employee table, and group the result by Employee.Dept and sum and count Orders.Amount, for instance:

// Employees is of List<Employee> type. We define Employee as  follows:

// data class Employee(var EId:Int, var State:String, var   Dept:String , var Name:String ,var Gender:String ,var Salary:Int,var  Birthday:Date)

data class OrderNew(var OrderID:Int ,var Client:String, var SellerId:Employee,var Amount:Double ,var OrderDate:Date)  
  val result = {o->var emp=Employees.firstOrNull{it.EId==o.SellerId}  
     .filter  {o->o!=null}  
  data class Grp(var Dept:String,var Gender:String)  
  data class Agg(var sumAmount: Double,var rowCount:Int)  
  var  result1=result.groupingBy{Grp(it!!.SellerId.Dept,it.SellerId.Gender)}  
    acc, elem ->  Agg(acc.sumAmount + elem!!.Amount,acc.rowCount+1)  
   }).toSortedMap(compareBy<Grp>  {  it.Dept}.thenBy {it.Gender})

As Stream, Kotlin does not support joins directly and resorts to hardcoding to achieve them. In the same way of thinking, Kotlin replaces SellerId, Orders table’s foreign key, with the corresponding records in Employee table. Since data structure is changed after the replacement (SellerId field type becomes different) and both Kotlin and Stream do not support dynamic data structures, we need to define a new data structure. Left joins and outer joins also need hardcoding to achieve but with the different logic for writing the core code. The inconsistencies cause a series of inconveniences. I’m sure you have noted that, in Kotlin, the code for performing a post-join grouping & aggregation is different from that of performing a simple grouping & aggregation, while in Stream, the code is the same. This is because Kotlin is designed to force null safety and certain measures are needed to ensure that. Stream, which is in essence Java-based, does not have that mandatory requirement.

Joins are an important part of structured data computations. Compared with SQL, Kotlin is as weak and unprofessional as Stream in supporting the operations.

According to all the examples, Kotlin has merits that Stream does not have and generates shorter code. On the other hand, the two have many common demerits. As compiled languages, both lack professional data objects for structured data computations, cannot support dynamic data structures, and are impossible to truly simplify the Lambda syntax and reference fields directly. They are not qualified to replace SQL (which is explained in detail in the essay on Stream).

At least for now, it is esProc SPL that is reliable when databases are not conveniently employed for performing structured data computations.

Leave a Reply