User Behavior Analysis in Practice 12: Using Pseudo Tables

User Behavior Analysis

Target task

We have a user events table T. Below is its structure and part of its data:

TimeUserIDEventTypeOSBrowserProductIDf1f2f3f4f5
2022/6/1 10:201072755SearchAndroidIE100001truefalsefalsetruefalse
2022/6/1 12:121078030BrowseIOSSafari100002falsefalsetruetruetrue
2022/6/1 12:361005093SubmitAndroidChrome100003truetruetruefalsefalse
2022/6/1 13:211048655LoginWindowsChromefalsefalsetruetruetrue
2022/6/1 14:461037824LogoutAndroidEdgefalsefalsefalsetruetrue
2022/6/1 15:191049626AddtoCartWindowsEdge100004truetruefalsetruefalse
2022/6/1 16:001009296SubmitIOSFirefox100005falsetruefalsefalsetrue
2022/6/1 16:391070713BrowseIOSSogou100006truetruetruefalsefalse
2022/6/1 17:401090884SearchWindowsIE100007truefalsetruetruefalse

Fields in table T:

Field nameData typeDescription
TimeDatetimeTime stamp of an event, accurate to milliseconds
UserIDStringUser ID
EventTypeStringEvent type, whose value is Login, Browse, Search, AddtoCart, Submit or Logout
OSStringOperating system, whose value is Android, IOS, Windows or Unknown
BrowserStringBrowser, whose value is IE, Safari, Edge, Firefox, Chrome, Sogou or Unknown
ProductIDStringProduct ID, whose value is the ProductID field of dimension table Product
StringOther fields that have enumerated values
f1BooleanWhether it is an offsite event or not; value is true or false
f2BooleanWhether it is a usual device or not; value is true or false
f3BooleanWhether it is a usual browser or not; value is true or false
f4BooleanWhether it is a cell phone or not; value is true or false
f5BooleanWhether it is the first operation; value is true or false
BooleanOther fields that have Boolean values

In the previous article User Behavior Analysis in Practice 9: Enumerated Dimension and Tag Dimension, we convert the enumerated field and the tag field into corresponding ordinal numbers and bits respectively. Below is the structure and part of data of converted user events composite table T.ctx:

TimeUserIDEventTypeOSBrowserProductIDb1
2022/6/1 10:20107275531110000136864
2022/6/1 12:12107803022210000214336
2022/6/1 12:36100509351510000357344
2022/6/1 13:21104865513514336
2022/6/1 14:4610378246136144
2022/6/1 15:19104962643310000453248
2022/6/1 16:00100929652410000518432
2022/6/1 16:39107071322610000657344
2022/6/1 17:40109088433110000745056

Fields in converted table T:

Field nameData typeDescription
TimeDatetimeTime stamp of an event, accurate to milliseconds
UserIDStringUser ID
EventTypeIntegerEvent type, whose value is an ordinal number of the enumerated sequence
OSIntegerOperating system, whose value is an ordinal number of the enumerated sequence
BrowserIntegerBrowser, whose value is an ordinal number of the enumerated sequence
ProductIDStringProduct ID, whose value is the ProductID field of dimension table Product
b1IntegerInteger field that stores binary fields as bits; the first five bits correspond to whether it is offsite, whether it is the usual device, whether it is the usual browser, whether it is cell phone, and whether it is the first operation

Ordinal numbers in EventType field and their description:

  1. Login
  2. Browse
  3. Search
  4. AddtoCart
  5. Submit
  6. Logout

Ordinal numbers in OS field and their description:

  1. Android
  2. IOS
  3. Windows
  4. Unknown

Ordinal numbers in Browser field and their description:

  1. IE
  2. Safari
  3. Edge
  4. Firefox
  5. Chrome
  6. Sogou
  7. Unknown

Dimension table Product:

ProductIDProductNameUnitPriceProductType
100001ApplePound5.5Fruits
100002TissuePacks16Home&Personalcare
100003BeefPound35Meat
100004WineBottles120Beverage
100005PorkPound25Meat
100006BreadPacks10Bakery
100007JuiceBottles6Beverage

Fields in dimension table Product:

Field nameData typeDescription
ProductIDStringProduct ID
ProductNameStringProduct name
UnitStringSales unit
PriceNumericUnit price
ProductTypeIntegerProduct type

Relationship between T and Product:

undefined

Computing task:

Find the number of occurrences of each type of event performed by users who are not newcomers on a local Android or IOS system using Safari, Edge or Chrome under the product type Home & Personal care in each month within a specified time period, and count the distinct users under same conditions.

Techniques involved:

Learn more about SPL pseudo tables in SPL Pseudo Table Data Type Optimization.

We can use the pseudo table to predefine computed columns and foreign keys, and simplify the representations of enumerated dimensions and tag dimensions:

1. Define a computed column Month to calculate the month in the time when the current event occurs.

2. Define pseudo fields EventTypeName, OSName and BrowserName to store correspondence relationships between EventType, OS and Browser’s ordinal numbers and their names.

3. Define the foreign key association on ProductID field between it and the corresponding dimension table.

4. Define field name represented by each bit in b1.

After a pseudo table is defined, we can use it to achieve data dump and summarization. This can significantly reduce the amount of code for summarization.

Sample code

1. Define pseudo table

A
1=T("Product.btx").keys@i(ProductID)
2=file("T.ctx").create(#Time,UserID,EventType,OS,Browser,ProductID,……,b1,……)
3=A2.close()
4=[{file:"T.ctx",column:[{name:"Month",exp:"month@y(Time)"},{name:"EventType",pseudo:"EventTypeName",enum:["Login","Browse","Search","AddtoCart","Submit","Logout"]},{name:"OS",pseudo:"OSName",enum:["Android","IOS","Windows","Unknown"]},{name:"Browser",pseudo:"BrowserName",enum:["IE","Safari","Edge","Firefox","Chrome","Sogou","Unknown"]},{name:"b1",bits:["f1","f2","f3","f4","f5",…….]},{name:"ProductID",dim:A1}]}]
5=pseudo(A4)

A1 Import dimension table Product, and set primary key and index for it.

A2 Create composite table structure after enumerated fields are converted into ordinal numbers and binary fields are transformed to bits, and write it to the empty file T.ctx (database data will later be directly appended to the file through the pseudo table without writing the conversion code for ordinal numbers and bits).

A3 Close A2’s composite table.

A4 Define a pseudo table based on composite table T.ctx:

{name:"Month",exp:"month@y(Time)"}: The code defines a computed column named Month and evaluated through month@y(Time).

{name:"EventType", pseudo:"EventTypeName", enum:[ "Login", "Browse", "Search", "AddtoCart", "Submit", "Logout"]}: The code defines correspondence relationship between EventType field values and the enumerated sequence, and represents the matching name through pseudo field EventTypeName.

{name:"OS",pseudo:"OSName",enum:["Android","IOS","Windows","Unknown"]}: The code defines correspondence relationship between OS field values and the enumerated sequence, and represents the matching name through pseudo field OSName.

{name:"Browser",pseudo:"BrowserName",enum:["IE","Safari","Edge","Firefox","Chrome","Sogou","Unknown"]}: The code defines correspondence relationship between Browser field values and the enumerated sequence, and represents the matching name through pseudo field BrowserName.

{name:"b1",bits:["f1","f2","f3","f4","f5",…….]}: The code defines field name each bit in the bit-based dimension b1 represents.

{name:"ProductID",dim:A1}: The code defines association relationship on ProductID field between it and dimension table A1.

A5 Generate the pseudo table.

The pseudo table definition string can be saved for direct use in later computations. This can further reduce the amount of code.

2. Use pseudo table to dump data. SPL will automatically convert data into corresponding ordinal numbers and bits according to pseudo table definition, and store the converted data in table T.

A
/ The above code for defining pseudo table
6=connect("demo").cursor@x("select * from T order by Time")
7=A5.append@i(A6)

A6 Connect to the database, and import data in table T to generate a cursor.

A7 Retrieve and append data in the cursor to the pseudo table.

3. Perform aggregation using the pseudo table

A
/ The above code for defining pseudo table
6>start=date("2022-03-15","yyyy-MM-dd"),end=date("2022-06-16","yyyy-MM-dd")
7=A5.select(Time>=start && Time<=end && ProductID.ProductType=="Home&Personalcare"&& ["Safari","Edge","Chrome"].pos(BrowserName) && ["Android","IOS"].pos(OSName) && ! f1 && f4 && !f5).groups(Month,EventTypeName; count(1):Num, icount(UserID):iNum)

A7 Perform aggregation using the pseudo table. We can use pseudo fields defined for the pseudo table as ordinary fields and treat the pseudo table as an ordinary, simple table without taking care of storage and computing mechanisms.

Execution result:

MonthEventTypeNameNumiNum
202203AddtoCart30760329252
202203Browse59649258140
202203Login67216365569
202203Logout67216365569
202203Search49131742919
202203Submit14455213901
202204AddtoCart61522258484
202204Browse1192970116265
202204Login1344323131123
202204Logout1344323131123
202204Search98263785843
202204Submit28911227799
202205AddtoCart61521458484
202205Browse1192976116262
202205Login1344339131133
202205Logout1344339131133
202205Search98263385848
202205Submit28910827788
202206AddtoCart30763529256
202206Browse59646358124
202206Login67217565575
202206Logout67217565575
202206Search49134442929
202206Submit14457313887

Leave a Reply