How to Call an SPL Script in Python

How to Call an SPL Script in Python

esProc provides JDBC interface, through which Python can call an SPL script. Below shows the Python invocation structure:

Python accesses esProcJDBC Server through py4j interface that uses socket, and invokes a dfx script to retrieve data from a database or a data source. esProcJDBC Server encapsulates JDBC to act as the server side. Before running Python, you need to first start JVM at the server side. Simply put, you place the jars and the configuration file for loading esProc at the launch of JAVA into the target project. Make the note that esProc JDBC requires JDK 1.8 or a higher version.

Suppose we are using Windows 10 and esProc is installed in the directory D:\Program Files\raqsoft (simply called [esProc directory] in sections below), the following illustrates how the invocation is done using esProc’s built-in JAVA.  

1. Loading jar drivers

esProc JDBC is a fully embedded computing engine. It depends on related driver jars and the configuration file raqsoftConfig.xml for use. Below are the jars it uses:

esproc-bin-20210811.jar                  // For starting esProc computing engine and JDBC
commons-math3-3.6.1.jar                  // For data computing and analysis
py4j-                        // For communication with Java
esproc-py4j-server.2.10.jar              //For accessing esproc jdbc server-side interface (this jar can be found in esProc external library)

In this illustration, we put the above jars under d:/app/lib. You can also place them in another directory or modify the script for loading these jars.

Run the script (start_jdbc_server.bat) for starting esProcJDBC Server:

@echo off
rem START_HOME is esProc installation directory, which is the esProc environment variable Server depends for running
set START_HOME="D:\Program Files\raqsoft"
set JAVA_HOME="D:\Program Files\raqsoft\common"
set EXECJAVA="D:\Program Files\raqsoft\common\jre\bin\java"
cd d:\app
rem parameter ip port
start "dm" %EXECJAVA% -Xms128m -Xmx8520m -cp .; -Djava.ext.dirs=./lib; %START_HOME%\common\jdbc; -Dstart.home=%START_HOME%\esProc com.esproc.jdbc.py4jServer %1 %2

The above script configures Java runtime environment variables, among which START_HOME is esProc installation directory. Besides loading the necessary jars from esProc installation directory, the script enables the environment configurations, such as external library invocation and searching for dfx script file. Default values for parameters ip and port are respectively and 25333. The values can be modified.

Read How to Call a Remote SPL Script in Java to learn details about deploying and starting esProc server.

2. Deploying raqsoftConfig.xml

raqsoftConfig.xml is esProc runtime environment configuration file located in [esProc directory]\esProc\config. It contains a set of information, including esProc main path, dfx file search path (the demo directory in this essay), JDBC data source and etc. In the script start_jdbc_server.bat, value of -Dstart.home is set as %START_HOME%\esProc and the configuration file will be automatically loaded for searching demo directory for the target dfx file or data file at the execution of the SPL script.

Lean more about the configuration file in How to Call a Remote SPL Script in Java.

3Calling the SPL script in Python

Once esProcJDBC Server is started, Python is able to invoke the SPL script. Same as calling the stored procedure, the SPL query result will be returned to the Python program in the form of a table (which is divided into two parts, column headers and detailed data.

A. Execute SPL statement

Suppose we have a data table made up of fields AA, BB and CC. We are trying to insert three records and return the result set using the following SPL script:

Below is the script for calling the SPL script in Python:
import pandas as pd
from py4j.java_gateway import JavaGateway, GatewayClient

# Connect to server using the default way, and convert the Python set into a Java set for parameter pass-in
gateway = JavaGateway(auto_convert=True)

# You are allowed to specify ip and port for the connection
gateway = JavaGateway(GatewayClient(address="",port=25333), auto_convert=True)

# Get the connection object
conn = gateway.entry_point.getApp()

# The core data processing code

# Execute script and return result set
result = conn.query(dfx)

# table header
cols = list(conn.getColumns()) 

# table data
rows = []
for lines in list(result):
# Store data in dataframe
df = pd.DataFrame(data=rows, columns=cols)

Output result set as follows:

    AA  BB  CC
0   1   2   3
1   4   5   6
2   7   8   9

The script uses py4j as communication module, so users do not need to specifically handle data import and export. conn.getColumns()returns a JavaArray type result, which needs to be converted into a list (or tuple) using list()(or tuple()). You can refer to related py4j documentation to learn more about data type conversion during communication between Python and Java. The esProc JDBC returns Table type data, which can be converted through list()or tuple().

B.Access a local file in SPL

There is text file aaa.txt in directory demo, as shown below:

pid age  name     work
101 11   Tom      techer
102 12   Jack     manager
103 12   Joan     driver
104 13   Billy    doctor
105 15   Carl     driver

Below is part of the Python code:

# The core data processing code
dfx= "select * from aaa.txt"
result = conn.query(dfx)

Output result set as follows:

   pid  age  name   work
0  101   11  Tom    techer
1  102   12  Jack    manager
2  103   12  Joan    driver
3  104   13  Billy      doctor
4  105   15  Carl      driver

Similar to a SQL table query, the returned result set is formatted and output using Pandas.

C. Execute a SPL statement with parameters

Parameters are an important part of the SQL statements, so are they in SPL statements. To get records where age is above 12 from the above text file, for instance, we have the following Python code:

Below is part of the Python code:

# The core data processing code
dfx=“select * from aaa.txt where age>?”
result = conn.query(dfx,[12])
cols = list(conn.getColumns())

Output result set as follows:

   pid  age   name    work
0  104   13   Billy  doctor
1  105   15   Carl   driver

Parameters are passed to the Java side in the form of an array. The length of the array should be inconsistent with the number of parameters the SPL statement requires. By the way, don’t forget to specify a True value for auto_convert in JavaGateway().

D. Execute a SPL statement containing a data source

Suppose we are trying to access employee data table in mysql database and return the table data as a result set. Remember to load mysql Jdbc driver at the start of JAVA.
Below is SPL script mytest.dfx:

1=connect("mysql")//Connect to mysql database
2=A1.query@x("select * from employee")//Query employee table
3return A2//Return the table data

Below shows how to configure the jdbc data source mysql in esProc IDE:

Below is part of the Python code:

# The core data processing code
dfx="call mytest"
result = conn.query(dfx)
cols = list(conn.getColumns())

Output result set as follows:

0    1  Rebecca  Moore     F  ...  1974-11-20  2005-03-11    R&D     7000
1    2  Ashley    Wilson     F  ...  1980-07-19  2008-03-16    Finance  11000
2    3  Rachel    Johnson    F  ...  1970-12-17  2010-12-01    Sales    9000
3    4  Emily     Smith      F  ...  1985-03-07  2006-08-15    HR      7000
4    5  Ashley    Smith      F  ...  1975-05-13  2004-07-30    R&D     16000

E. Return a nested result set

NoSQL databases are now commonly used. The result set of querying such a database often has nested structures. We will take MongoDB as an example to illustrate how to do the invocation in those cases. First select MongoCli on Select external library window (as shown below):


1=mongo_open("mongodb://")//Connect to mongodb database
2=mongo_shell(A1,"storage.find()").fetch()// Query storage table
3>mongo_close(A1)//Close database connection
4return A2//Return table data

Below is result returned by 2:

Below is part of the Python code:

# The core data processing code
dfx=“call MongoTest”
result = conn.query(dfx)
cols = list(conn.getColumns())

Output result set as follows:

   _id           name           items
0  1000  Storage Alpha           category  name
                                 food    apple
                                 food    banana
                                 tool     h…
1  1001   Storage Beta           category  name
                                 food    pear
                                 food    peach
                                 food    gra…

Python calls the SPL script to query a data table, returns it as a result set, store the result in Pandas, and format and output it. A nested structure can be contained in a table sequence or a sequence. For more complicated structures, you can define a user-defined interface at the JAVA side for data conversion.

F. Query cursors

Unlike query() computations, cursor queries involves two parts – execution of the SPL script and retrieving and processing data circularly batch by batch. Taking mytest.dfx that uses mysql as the data source, let’s look at how to query a cursor in SPL:

curs = gateway.entry_point.getApp()
dfx="call mytest"
# Execute the SPL script
cols = list(curs.getColumns())
# fetch data
rows = []
    result = list(curs.fetch(5))
    for line in result:
df = pd.DataFrame(data=rows, columns=cols)

The SPL statement curs.cursor(dfx) is first executed, and then data will be fetched circularly from curs.fetch(). When the parameter in the data fetching interface fetch(size) is absent, just use the default value 1000.

G. Return multiple result sets

Each of the result sets returned from esProc to Python is made up two parts – column headers and detailed data.


1=create(AO,BO,CO).record(to(6))//Create table A1
2=create(AS,BS).record(to(6,11))//Create table A2
3=create(AT,BT,CT).record(to(20,25))//Create table A3
4return A1,A2,A3//Return multiple tables at one time
dfx=“call multiTable”
# Execute the SPL script and return a list of result sets
results = conn.mquery(dfx)
for result in results:
    rows = []
    cols = tuple(result[0])            # header
    for lines in tuple(result[1]):    # data
    df = pd.DataFrame(data=rows, columns=cols)

When returning multiple result set, an SPL query traverses the list of result sets through mquery() interface, stores each one into Pandas, formats and outputs them.

Compared with the way JaydebeApi uses to call jdbc, Python client uses a different way to invoke esProcJdbc module. You do not need to start JVM each time when the Python application is launched. This increases efficiency. Moreover, esProc offers simple, clear, and easy to use development process, making easy integration of the SPL script into Python.
Those are the common ways of calling the SPL script in Python. Refer to the following essays to learn about the methods of calling an SPL script in the other applications:

Leave a Reply