Besides DBLink, Is There Any Better Choice to Achieve Cross-data Computations?

The same type of database components, besides DBLink, also includes Federated Database and Linked Server. Their configurations are complicated. And for most computations, data needs to be loaded to the local computer, which wastes the computing abilities of the remote database. Calcite is a Java class library that also supports cross-database computations. It is simple to configure and convenient to integrate, and is offered under an open-source license. Yet it is underdeveloped, without support of many functions. Scala, the language intended to compute big data, supplies abilities to perform cross-database computations, too. It is more mature and has rather good performance, but it has complex and heavy framework, produces complicated and lengthy code, and is error-prone when data cannot fit into the memory.

SPL is an ideal choice to implement cross-database computations.

The Java-based class library is open-source, light-weighted and simple to configure, offers a wealth of functions and easy to integrate JDBC interface, and is able to make full use of the database’s computing abilities and compute data that cannot fit into the memory stably.

SPL can exploit the most of database’s computing capabilities. The remote database will first execute SQL or the stored procedure, then SPL retrieves the smaller result set to the local and performs cross-database computation. To perform an inner join between MySQL and Oracle, for instance:

1=mysql1.query("select SellerId, sum(Amount) subtotal from Orders group by SellerId")
2=orcl.query("select EId,Name from employees")
3=join(A1:O,SellerId; A2:E,EId).new(O.Name:name, O.Dept:dept, E.subtotal:amt)

SPL offers a rich library of functions to achieve a computation with simple and intuitive code after the cross-database computation. To get records according to page number after a cross-database computation, for instance, SPL has the following code, where p_Size and p_No are parameters:

3…//Cross-database computation*(p_No-1),p_Size*p_No)

SPL offers JDBC driver to be conveniently invoked by a Java program. The above algorithm, for instance, can be stored as a SPL script file and invoked in a Java program in the way a stored procedure is called:

Connection conn =DriverManager.getConnection("jdbc:esproc:local://");
CallableStatement statement = conn.prepareCall("{call pageQuery(?, ?)}");
statement.setObject(1, 10);
statement.setObject(2, 3);

More examples:

3>1000 && like(dept,\"*S*\"))//Search
6=A2 .id(dept)//Distinct
7=A2.groups(dept;sum(amt))//Grouping & aggregation

SPL provides cursor mechanism to handle data whose size exceeds the available memory space with simple and stable code. For instance, Employees and Orders are large tables and stored in different databases. We first perform a cross-database join and then grouping & aggregation:

1=orcl.cursor("select EId, Dept from Employees order by EId")
2=mysql1.cursor("select SellerId, Amount from Orders order by SellerId")
3=joinx(A2:O,SellerId; A1:E,EId)

The algorithm performs the join using the order-based merge to obtain better performance.

With computations between sub-databases of same structure, SPL provides parallel processing of simple and concise syntax:

1=[connect("mysql1"),connect("mysql2"),connect("mysql3")]/Connect to multiple mysql databases
2select * from orders where amount>=10000/SQL
3fork A1=A3.query@x(A2)/Execute SQL with parallel processing
4=A3.conj()/Concatenate result sets

In addition to databases, SPL also supports file sources, such as CSV and XLS, and various NoSQL sources, as well as mixed computations between different types of data sources or databases.

Leave a Reply