Tag: Spark

Hadoop/Spark is too heavy, esProc SPL is light

With the advent of the era of big data, the amount of data continues to grow. In this case, it is difficult and costly to expand the capacity of database running on a traditional small computer, making it hard to support business development. In order to cope with this problem, many users begin to turn to the distributed computing route, that is, use multiple inexpensive PC servers to form a cluster to perform big data computing tasks. Hadoop/Spark is one of the important software technologies in this route, which is popular because it is open source and free. After years of application and development, Hadoop has been widely accepted, and not only can it be applied to data computing directly, but many new databases are developed based on it, such as Hive and Impala.

The heaviness of Hadoop/Spark

The goal of Hadoop is to design a cluster consisting of hundreds of nodes. To this end, developers implement many complex and heavy functional modules. However, except for some Internet giants, national communication operators and large banks, the amount of data in most scenarios is not that huge. As a result, it is common to see a Hadoop cluster of only a few or a dozen nodes. Due to the misalignment between goal and reality, Hadoop becomes a heavy product for many users whether in technology, use or cost. Now we will explain the reason why Hadoop is heavy in the said three aspects. continue reading →

Is There Any Alternative Technology to Spark?

HANA is a popular in-memory database and, theoretically, has the potential to replace Spark, but that it is not open-source keeps people away. SQLite, another in-memory database, is open-source, but it only supports embedded call, which puts great limits on data size and computing performance. Redis is open-source, high-performance, and supports processing of huge volumes of data, but it is extremely bad at doing computations and turns to a great deal of hardcoding for performing in-memory computations.

The best Spark alternative is esProc SPL. continue reading →