Category: SPL understanding
A programming language coding in a grid
What? A programming language coding in a grid?
Yes, you read that right, SPL (Structured Process Language) is a programming language that codes in a grid, and specially used for processing structured data.
Comparison between SPL and Python in processing structured data
SPL is designed primarily to address the difficulties of SQL (difficult to code and slow to run for complex task, difficult to perform cross-source calculation, dependent on stored procedure), and its application scenarios are similar to those of SQL.
The significance of ordered storage for high performance
Ordered storage means storing the data after sorting them by certain fields. This storage method enables us to implement some high-performance algorithms, and utilize the ordered feature of data to reduce computing complexity, thus greatly improving computing performance.
SPL Cloud Data Warehouse
The overwhelming majority of the cloud data warehouse services on the market (actually we can say all of them) are based on SQL. After all, data warehouses’ primary responsibility is analytical computing. NoSQL has technological advantages in handling TP but they are not nearly as good as SQL in dealing with AP.
The performance problems of data warehouse and solutions
As the volume of data continues to grow and the complexity of business rises gradually, we are facing a big challenge in data processing efficiency. The most typical manifestation is that the performance problem of data warehouse is becoming more and more prominent when dealing with an analytical task, and some problems occur from time to time such as high computing pressure, low performance, long query time or even unable to find out result, production accident caused by failure to accomplish a batch job on time. When a data warehouse has performance problem, it doesn't serve the business well.
HTAP database cannot handle HTAP requirements
HTAP (Hybrid Transaction and Analytical Process) has become the direction of effort of many database vendors since it was explicitly proposed in 2014. In fact, HATP is not new because when RDB began to emerge in the early years, it was exactly to use one database to perform transaction and analysis at the same time.
Routable computing engine implements front-end database
Many large organizations have their own central data warehouse to provide data service to applications. As business grows, the load on data warehouse continues to increase. To be specific, the increased load comes from two aspects: one is that the data warehouse, as the data backend of front-end applications, will face increasing front end applications and concurrent queries; the other is that since it also undertakes the off-line batch job of raw data, the data volume and computing load will increase as batch job increases. As a result, the data warehouse is often overloaded, causing many problems like too long batch-job time (far exceeding the time limit that a business tolerates); too slow response to on-line query (the users have to wait for a long time, resulting in increasingly low satisfaction). Especially at the end of a month or year when computing load is at its peak, these problems will get worse.
The significance of open computing ability from the perspective of SPL
Relational database provides SQL, so it has strong computing ability. Unfortunately, however, this ability is closed, which means that the data to be calculated and processed by database must be loaded into database in advance, and whether the data is inside or outside the database is very clear. On the contrary, the open computing ability means that the data of multiple sources can be processed directly without having to load them into database.
How to speed up funnel analysis of e-commerce system
In the e-commerce system, the conversion funnel analysis is a very important data analysis calculation. The user of e-commerce platform will conduct multiple operations (events), including page viewing, searching, adding to cart, placing an order and paying, etc. These events occur in a certain order, and the later the event occurs, the fewer the number of users involved in the event, just like a funnel. Usually, the conversion funnel analysis is to count the number of users of each event first, and then do further calculations based on counting result, such as calculating the conversion rate. Since such analysis involves huge data, and the calculation is very complex, it often leads to performance problem.
How to implement fast multi-index calculation
In statistical analysis application, various indexes calculated from detailed data are important data to support business. However, if you want to implement a fast and flexible multi-index calculation, the backend data source will face several challenges.