Category: SPL application scenarios

Is There a Database Technique That Enables Fast JOIN?

JOIN is a long-standing challenge in optimizing database performance. The more large tables in a JOIN operation, the worse the performance.

The key of JOIN optimization is to properly classify the operations so that we can design and choose the right optimization method according to the characteristics of each category. SQL defines JOINs in one single way that is too general and simple to leave little room for performance optimization. This is why it is hard to speed up JOIN operations in relational databases. continue reading →

continue reading →

Is There Any Alternative Technology to Spark?

HANA is a popular in-memory database and, theoretically, has the potential to replace Spark, but that it is not open-source keeps people away. SQLite, another in-memory database, is open-source, but it only supports embedded call, which puts great limits on data size and computing performance. Redis is open-source, high-performance, and supports processing of huge volumes of data, but it is extremely bad at doing computations and turns to a great deal of hardcoding for performing in-memory computations.

The best Spark alternative is esProc SPL. continue reading →

continue reading →

Is There Any Open-source Library That Can Achieve Cross-database Computations?

Some databases intrinsically support cross-database computations, such as DBLink and Linked Server, but they are not open-source and complicated to configure. And on most occasions, data should be loaded to the local for computation, resulting in low performance. Among the open-source databases that support cross-database computations, Scala can make most use of databases’ computing abilities and offer excellent performance, but it is heavy-weighted, produces lengthy code, and is error-prone when data cannot fit into the memory. Both Calcite and Tablesaw are simple to configure and light-weighted, but they are underdeveloped and support too few functions.

SPL, the Java-based open-source library that also supports cross-database computations, becomes the best alternative. continue reading →

continue reading →

Besides DBLink, Is There Any Better Choice to Achieve Cross-data Computations?

The same type of database components, besides DBLink, also includes Federated Database and Linked Server. Their configurations are complicated. And for most computations, data needs to be loaded to the local computer, which wastes the computing abilities of the remote database. Calcite is a Java class library that also supports cross-database computations. It is simple to configure and convenient to integrate, and is offered under an open-source license. Yet it is underdeveloped, without support of many functions. Scala, the language intended to compute big data, supplies abilities to perform cross-database computations, too. It is more mature and has rather good performance, but it has complex and heavy framework, produces complicated and lengthy code, and is error-prone when data cannot fit into the memory.

SPL is an ideal choice to implement cross-database computations. continue reading →

continue reading →

Is There Any Alternative to Stored Procedures?

There has been a long-standing debate about the stored procedure’s weaknesses. Now let’s again look at its disadvantages.

on-migratable continue reading →

continue reading →

Is There Any Simple and Lightweight In-memory Database Technology?

HANA, SPARK and Redis are commonly used in-memory databases, but they all have complex and heavy frameworks, compromising their ranges of applications. The popular simple and lightweight in-memory technology is SQLite, which is nimble and simple-framed and can be directly embedded in a Java program. Its disadvantages include lack of independent services and support for stored procedures, unstable environment and slow execution, weak computing abilities, and complicated data loading process for external data use.

The technology has better and more powerful embodiment – esProc SPL. continue reading →

continue reading →

Is There Any Lightweight Big Data Computing Technology?

All popular big data computing technologies, including Hadoop, Storm, Hive and Spark, use large-cluster-based strategies, which are suited to large enterprises having massive-scale data. Actually, those technologies originate from some IT giants. Yet, small-scale clusters are enough, even no clusters are needed, for handling many real-world scenarios involving the so-called “big data” that is, in fact, far less than that the giant corporations have, and smaller companies do not have many hardware components and a large team of maintenance staff. For them, a lightweight big data computing technology will exactly fit their needs.

Among the few such technologies, esProc SPL is the flagship. continue reading →

continue reading →

Is ClickHouse as Powerful as We Thought?

The open-source analytical database ClickHouse (CH) is hot recently and is said to be exceptionally fast for performing OLAP analyses. Many users suffering from performance problems are eager to have a try.

Does the DBMS really meet our expectations? The following performance test will tell us. continue reading →

continue reading →

Is Elasticsearch Best Suited to High-Concurrency Queries?

Compared with SQL databases/ data warehouses, the search engine Elasticsearch is more suitable for implementing high-concurrency queries, such as account detail query that searches for several to thousands of detailed records from tens of millions of, even a billion, rows of historical data. The queries feature huge amounts of data, high concurrency, and demand of sub-second response times. The SQL-based technological frameworks, including relational databases and HADOOP data warehouses, can hardly meet the requirements with available resources. The popular practice is to export data to Elasticsearch and use its search technology to achieve good performance at high concurrency. From this perspective, ES is a good choice for performing high-concurrency queries.

Unfortunately, ES does not support JOIN operations, which is rather inconvenient. Take the account details query as an example, and we want to get a result set consisting of fields including store name, address, and phone number, etc. These fields are generally stored in the store table, and a join between it and the detail table is needed to get them. As ES does not support performing JOINs, we have to combine the store data into the detail data to get a wide table, as shown below: continue reading →

continue reading →

How to Speed up JOIN Operations Involving Huge Tables, like the Order Table and the Order_detail Table?

Below are order table, whose primary key is id, and order_detail table, which has a composite primary key consisting of id and productid. We might want to perform joins on the two tables. Suppose we are trying to group data by customer and order date and subtotal order amounts in each group. The grouping fields are order table’s customerid and orderdate, and order amount is the result of multiplying price by quantity in the order_detail table.

Such a join is rather common. Characteristics are: The join field(s) is the primary key or one or more fields of a composite key; relationship between tables is one-to-one or one-to-many. Two tables with a one-to-many relationship are the primary table and the sub table; a primary table can have more than one sub table. continue reading →

continue reading →