SPL: Text Handling

SPL: Text Handling

Text handling is an integral and indispensable part of any programming language. SPL (Structured Process Language) provides abundant functions for handling text. There are four categories – character handling, simple string handling, sequence-related handling, and complex analysis. They cover extracting a substring from a string, type judgment and conversion, character set handling, data type parsing, search and replacement, format handling, string generation/splitting/combination, the use of regular expression matching, extracting words and numbers, SQL statement analysis, a HTML analysis, etc.

1.character handling

1.1 Type judgment

TaskCodeReturn value
Check if all characters are numbersisdigit(“12345”)true
Check if all characters are lettersisalpha(“abcde”)true
Check if all characters are lower-case lettersislower(“aBcd”)false
Check if all characters are upper-case lettersisupper(“ABCD”)true

1.2 Conversion

TaskCodeReturn value
Convert all characters to lower-caselower(“aB’cD’e”)ab’cd’e
Skip a string put in quotes when converting to lower-caselower@q(“aB’cD’e”)ab’cD’e
Convert all characters to upper-caseupper(“aB’cD’e”)AB’CD’E
Skip a string put in quotes when converting to upper-caseupper@q(“aB’cD’e”)AB’cD’E
Convert row number and column number to an Excel-style cell namecellname(8,32)AF8

1.3 Character set handling

TaskCodeReturn value
Get the Unicode value for the 2ndcharacterasc(“abc”,2)98
Get the Unicode value for the 1stcharacterasc(“中国”)20013
Get the corresponding character for the Unicode valuechar(98)B
Get the corresponding character for the Unicode valuechar(22269)
Encode a URL stringurlencode("a=b+c>0?1:-1","UTF-8")a%3Db%2Bc%3E0%3F1%3A-1
Decode an encoded URL stringurlencode@r(“a%3Db%2Bc”,”UTF-8”)a=b+c
Base64-encode a byte arraybase64(bytes)ixb7x1dxfbxefxff
Convert to an MD5-encrypted stringmd5(“abcde”)AB56B4D92B40713ACC5AF89985D4B786

1.4 Data type parsing

Parse a string value into one of the corresponding data type.

TaskCodeReturn value
Parse a numeric string into a numbernumber("1234.56")1234.56
Parse a currency string into a numbernumber("$10,234.55","$#,###.#")10234.55
Parse a time format string into a time objectparse("10:20:30")A time object
Parse a sequence type string into a sequenceparse("[1,2,3]")Sequence [1,2,3]
Remove quotes and escape characters at parsingparse@e("\"a\tb\"")a b
Only parse the numeric characters beginning from the left-most oneparse@n("123adsdx234")123
Parse until the second half the quotation marks appears when a given string begins with the quotation marksparse@q("\'sadsdxc\'+23")sadsdxc

1.5 Format handling

TaskCodeReturn value
Convert a numeric value to a corresponding format stringstring(12345.678,”#,##0.00”)12,345.68
Convert a date value to a corresponding format stringstring(date(“2020-03-15”),”MM/dd/yyyy”)03/15/2020
Convert a time value to a corresponding format stringstring(time(“16:18:54”),”h:mm a”)4:18 PM
Convert a datetime value to a corresponding format stringstring(now(),”MM/dd/yyyy HH:mm”)2020-05-28 14:06
Enclose a string with quotation marksstring@q(“ab cd”)“ab cd”
Escape a characterstring@e(“ab\”c d”)ab\"c\td
Escape a character, and convert characters in a large character set to Unicode valuesstring@u(“ab\”c 中国”)ab\"c\t\u4E2D\u56FD
Parse a format specifier substring of a string according to the second parameterformat("The price is $%.2f",8.5)The price is $8.50

2. Simple string handling

2.1 Getting a substring

Get a substring from a source string. Related functions are mid(), left() and right(). Below are some examples:

TaskCodeReturn value
Get a substring from the 3rd charactermid(”abcdef”,3)cdef
Get a substring of two characters from the 3rd charactermid(”abcdef”,3,2)cd
Get a substring of the leftmost three charactersleft(”abcdef”,3)abc
Get a substring by removing the last two charactersleft(”abcdef”,-2)abcd
Get a substring of the rightmost three charactersright(”abcdef”,3)def
Get a substring by removing the first two charactersright(”abcdef”,-2)cdef

substr(s1,s2)

The function searches source string s1 for the target substring s2, and returns the substring before or after s2 or null if s2 cannot be found.

TaskCodeReturn value
Get the substring after a given substringsubstr(“abCDcdef”,”cd”)ef
Get the substring before a given substringsubstr@l(“abCDcdef”,”cd”)abCD
Get the case-insensitive substring after a given substringsubstr@c(“abCDcdef”,”cd”)cdef
Get a substring with a string put in quotes skippedsubstr@q(“ab\”acd\”cdef”,”cd”)ef

2.2 Search and matching

pos(s1,s2{,begin})

The function finds the position of the target substring s2 in source string s1, and returns the original position of s2 in s1 or null if s2 cannot be found.

TaskCodeReturn value
The ordinary searchpos(”abcdef”,”aa”)null
The ordinary searchpos(”abcDedefgh”,”de”)6
Perform the search from the 5th characterpos(”abcdedefgh”,”de”,5)6
Perform the search backwardspos@z(”abcdedefgh”,”de”)6
Perform a case-insensitive searchpos@c(”abcDedefgh”,”de”)4
Search for the target string at the beginningpos@h(”abcdefgh”,”ab”)1
Search for the target string at the tail endpos@hz(”abcdefgh”,”fgh”)6
Perform the search with a string put in quotes skippedpos@q(”ab\”cde\”fcd”,”cd”)9
Check if the source string matches a given format stringlike("abc123", "abc*")true
Case-insensitive for format string matchinglike@c("abc123", "ABC*")true

2.3 Replacement

replace(s,a,b)

The function replaces substring a in source string s with string b and returns the new s.

TaskCodeReturn value
The ordinary replacementreplace("abc’ab’deA","a","ss")ssbc'ssb’deA
Replace the first-found target substring onlyreplace@1("abc’ab’deA","a","ss")ssbc'ab’deA
The case-insensitive replacementreplace@c("abc’ab’deA","a","ss")ssbc'ssb’dess
Perform the replacement with a string put in quotes skippedreplace@q("abc’ab’deA","a","ss")ssbc'ab’deA

2.4 Others

TaskCodeReturn value
Repeat a given string n times to generate a new stringfill(“abc”,4)abcabcabcabc
Delete white spaces at both ends of a given stringtrim("a bc")"a bc"
Delete white spaces on the left of a given stringtrim@l("a bc")"a bc"
Delete white spaces on the right of a given stringtrim@r("a bc")"a bc"
Delete unwanted white spaces from a given stringtrim@a("a bc")"a bc"
Patch a string before an existing one to extend the latter to the desired lengthpad("Soth","Miss",10)MissMiSoth
Patch a string after an existing one to extend the latter to the desired lengthpad@r("Soth","er",8)Sotherer
Generate a numeric string with the length of 1 randomlyrands(“0123456789”,11)44238061662

3. Sequence-related handling

3.1 Splitting

s.split(d)

The function splits string s into a sequence using delimiter d, or into a sequence of single characters when d is absent.

TaskCodeReturn value
Split a given string using the vertical line“aa|bb|cc”.split(“|”)[aa,bb,cc]
Split a given string using the comma"1,[a,b],(2,c),'5,6'".split@c()[“1”,”[a,b]”,”(2,c)”,’5,6’]
Split a given string into two parts by the first-found delimiter"1,[a,b],(2,c),'5,6'".split@c1()[“1”,”[a,b],(2,c),’5,6’”]
Do not handle parentheses/brackets and quotation marks handling at splitting"1,[a,b],(2,c),'5,6'".split@cb()[“1”,”[a”,”b]”,”(2”,”c)”,”’5”,”6’”]
Parse each member in the result sequence as an object"1,[a,b],(2,c),'5,6'".split@cp()[1,[a,b],”(2,c)”,”5,6”]
Delete white spaces at both ends of each member in the result sequence“192.168.0.3”.split@t(“.”)[“192”,”168”,”0”,”3”]
Treat parameter d as a regular expression, and split the string, for instance, by digits"a1b2c57d".split@r("\\d")[“a”,”b”,”c”,””,”d”]
Split a given string by line break, and then perform the ordinary or the option-enabled splitting on each line"s,a,y\ngood,morning".split@nc()[[“s”,”a”,”y”],[“good”,”morning”]]

3.2 Concatenation

A.concat(d)

The function concatenates members of sequence A into a string through delimiter d, during which the sub-sequences are handled in the same way, or joins them up directly when d is absent.

In the following example functions, A1 is [1, ["a","b"],[2,"c"]].

TaskCodeReturn value
Concatenate members without a delimiterA1.concat()1[ab][2c]
Concatenate members using the colonA1.concat(“:”)1:[a:b]:[2:c]
Concatenate members using the commaA1.concat@c()1,[a,b],[2,c]
Enclose each member with double quotation marks at concatenationA1.concat@q()1[“a””b”][2”c”]
Enclose each member with single quotation marks at concatenationA1.concat@i()1[‘a’’b’][2’c’]
With a two-level sequence, concatenate the outer level by line break and then perform concatenation on each sub-sequence[[1,2,3],["a","b"],[2,"c"]].concat@n(“-“)1-2-3
a-b
2-c
Concatenate parameters into a string and do not enclose each result members with quotesconcat(2,["a","b"],"cd")2abcd

4. Complex analysis

4.1 Regular expression matching

TaskCodeReturn value
Match a given string with the regular expression“4,23,a,test”.regex("(\\d),([0-9]*),([a-z]),([a-z]*)")[“4”,”23”,”a”,”test”]
Match a given string with the regular expression“4,23,a,test”.regex("(\\d),([a-z]),([0-9]*),([a-z]*)")null
Case-insensitive for the matching"a:Test:B".regex@c("([a-z]):([a-z])")[“a”,”T”,”t”,”B”]

Learn more about regular expressions.

4.2 Word and number extraction

In the following example functions, s is "hi-10 hello!2020-01-01A8,3.14".

TaskCodeReturn value
Extract English wordss.words()[”hi”,”hello”,”A”]
Extract digits stringss.words@d()[”10”,”2020”,”01”,”01”,”8”,”3”,”14”]
Extract English words and strings of digitss.words@a()[”hi”,”10”,”hello”,”2020”,”01”,”01”,”A”,”8”,”3”,”14”]
Extract all: words, strings of digits and other character stringss.words@w()[”hi”,”-”,”10”,””,”hello”,”!”,”2020”,”-”,”01”,”-”,”01”,”A”,”8”,”,”,”3”,”.”,”14”]
Extract all, where strings beginning with digits will be identified as numbers or datetimes as a wholes.words@wp()[”hi”,”-10”,” ”,”hello”,”!”,”2020-01-01”,”A”,”8”,”,”,”3.14”]
Treat continuous English letters and digits as a words.words@i()[”hi”,”hello”,”A8”]

4.3 SQL statement analysis

sql.sqlparse(part)

The function splits a SQL query according to the order of select clause, from clause (including the join), group by clause, having clause and order by clause, make these syntactic units parameters members, and returns a sequence of them. When parameter part is present, replace the corresponding syntactic parameter in the source SQL query with it, and return a new SQL.

The sql in the following example functions is as follows:

select emp.EID,emp.NAME,emp.BIRTHDAY,emp.SALARY,dep.DEPT,dep.MANAGER
    from EMPLOYEE emp left join DEPARTMENT dep on emp.DEPT=dep.DEPT
    where emp.EID<100 and dep.DEPT='R&D'
    order by emp.SALARY,emp.EID

sql2:select * from dept

TaskCodeReturn value
Get all syntactic units of a given SQL querysql.sqlparse()Omitted; please refer to Function Reference
Get select clausesql.sqlparse@s()emp.EID,emp.NAME,emp.BIRTHDAY,emp.SALARY,dep.DEPT,dep.MANAGER
Get from clausesql.sqlparse@f()EMPLOYEE emp left join DEPARTMENT dep on emp.DEPT=dep.DEPT
Get where clausesql.sqlparse@w()emp.EID<100 and dep.DEPT='R&D'
Get group by clausesql.sqlparse@g()null
Get having clausesql.sqlparse@h()null
Get order by clausesql.sqlparse@o()emp.SALARY,emp.EID
Get syntactic units of a given SQL query and split each unit into a sequence of smaller onessql.sqlparse@a()Omitted; please refer to Function Reference
Replace select clause with the parametersql2.sqlparse@s(“deptid”)select deptid from dept
Replace from clause with the parametersql2.sqlparse@f(“sales”)select * from sales
Convert a SQL query to the counterpart in a specified databasesql.sqltranslate(“MYSQL”)Omitted

4.4 Node string handling

Node strings are key/value strings. In node strings in SPL, a key and its value are connected by the equals sign. Different nodes are separated by the white space.

xs.property(n,v)

The function reads value of the node named n from node string xs and returns the result. When string v is present, make v n’s value; and delete n when v is absent.

In the following example functions, xs is a set of key=value pairs: color=red size=20 price=500.

TaskCodeReturn value
Get a two-column table sequence made up of all propertiesxs.property()Omitted
Get value of a given propertyxs.property(“size”)“20”
Get value of a given property and enclose itxs.property@q(“size”)“\”20\””
Get value of a given property and parse it into the corresponding data typexs.property@v(“size”)20
Modify value of a given propertyxs.property(“size”,”30”)color=red size=30 price=500
Delete a given propertyxs.property(“size”,null)color=red price=500
Get value of a given property form node strings separated by semicolon“color=red;size=30“.property@cv(“size”)30
Get value of a given property form node strings where the property name and value are comma-separated“color:red size:30“.property@j(“size”)“30”

4.5 HTML string handling

s.htmlparse(tag:i:j,…)

The function gets the jth text under the ith tag in HTML string s; or gets all text from s when all parameters are absent.

Below is the HTML string in the following example functions:

<div><div>aaa<div>bbb<span>ccc</span>ddd</div></div></div>
<table><tr><td>1</td><td>2</td><td>3</td></tr>
<tr><td>a</td><td>b</td><td>c</td></tr>
</table>
TaskCodeReturn value
Get the 3rd text under the 3rd div taghtml.htmlparse("div":2:2)ddd
Get the 1st text under the 2nd div tag and the 1st text under the 1st span taghtml.htmlparse("div":1:0,"span":0:0)[aaa,ccc]
Get the content of the 1sttable taghtml.htmlparse("table":0)[[1,2,3],[a,b,c]]

Learn more about text handling through string() functions.

Leave a Reply