Michael Riordan

Taking the Next Step with Aster's Path Analytics

Blog Post created by Michael Riordan Champion on Aug 11, 2017

Aster's nPath function is a powerful tool for uncovering patterns in sequential data.  There are many examples here on the Aster Community for writing those queries and visualizing the output.  But that's not all Aster has to offer for path analysis.  There are more steps that you can take to create even more value.  I'm going to show you a few easy queries that will take the output from nPath and start to build out a model that may be used to predict the next step on a path.

 

I'll use the demo data set that ships with Aster Express, bank_web_clicks, though really any output from your own nPath analysis will work just as well.  Here's a simple Tree diagram that I've created from this data, showing a handful of paths that customers have taken on this web site.

 

 

These Tree diagrams are incredible useful for visualizing paths and helping to uncover patterns for deeper analysis.  Additionally, having an underlying table with metrics around parent nodes, child nodes, counts and percentages could begin to form the foundation for a model to predict likely outcomes for other customers on similar journeys, allowing you to answer such questions as "what's the next likely step my customer will take".  In a few steps, I'll show how to build out a table with rows like this showing Parent paths and Next steps with calculated probability:

 

ParentNextProbability
ACCOUNT SUMMARY, BILL MANAGER FORM
BILL MANAGER ENROLLMENT
1
ACCOUNT SUMMARY, FUNDS TRANSFER, CUSTOMER SUPPORT
FAQ
.59
ACCOUNT SUMMARY, FUNDS TRANSFER, ONLINE STATEMENT ENROLLMENT
ACCOUNT HISTORY
.54
ACCOUNT SUMMARY, FAQ, CUSTOMER SUPPORT
ACCOUNT HISTORY
.42

 

Introducing the Aster path_analyzer function.  This takes the output from your nPath query, with 'group by' counts for each path, and then breaks out all the paths and counts from each parent node, working its way down the full depth of the tree.  Getting the "top paths and counts" from nPath is a pretty standard part of our normal nPath workflow and done easily with a simple query.  Let's build a new table, called 'top_bank_paths'.  Here I'll limit this to the top 1000 paths.  (Since these tables are small, I'm using 'replication' as my distribution).

 

create table top_bank_paths
   distribute by replication
as
select path,count(*) as cnt
from bank_paths
group by 1
order by 2 desc
limit 1000
;

 

With results:

 

pathcnt
[ACCOUNT SUMMARY, FAQ]2854
[ACCOUNT SUMMARY, VIEW DEPOSIT DETAILS]2830
[ACCOUNT SUMMARY, FUNDS TRANSFER]2746
[ACCOUNT SUMMARY, ACCOUNT SUMMARY]2715

 

We will now use these as input to path_analyzer:

 

SELECT * FROM Path_Analyzer
(
    ON (SELECT 1)
    PARTITION BY 1
    InputTable ('top_bank_paths')
    OutputTable ('bank_path_analyzer_output')
    --SeqColumn ('path')  Aster v7.0
    --CountColumn ('cnt') Aster v7.0
    Seq('path')  --Aster v6.20
    Cnt('cnt')   --Aster v6.20
);

 

The syntax did change slightly for the Seq and Cnt parameters with Aster Analytics 7.0, so I've included both above. Simply comment/uncomment based on your library version.  Also, if you want to rerun path_analyzer, you'll need to first drop your output table.  In this example, that would be:

     drop table bank_path_analyzer_output;

 

The output is a bit verbose, so here are the important fields:

 

parentchildrensubpath_cntdepth
^,[ACCOUNT SUMMARY  [(^,[ACCOUNT SUMMARY, BILL MANAGER, ACCOUNT HISTORY]),(^,[ACCOUNT SUMMARY, BILL MANAGER, ACCOUNT SUMMARY]),(^,[ACCOUNT SUMMARY, BILL MANAGER, FAQ]),(^,[ACCOUNT SUMMARY, BILL MANAGER, FUNDS TRANSFER]),(^,[ACCOUNT SUMMARY, BILL MANAGER, VIEW DEPOSIT DETAILS])]702

 

With this 'not too complex' SQL, we can now build a table for each parent node with corresponding 'next child node' and percentage calculation.  I'm doing a little string manipulation; skipping those first 4 characters in the parent format and likewise in the 'next_node' field, and trimming the trailing ']' character, which represents 'end of path'.  I'm also using a window function, row_number(),  to rank each child by percentage so that we can later filter by top 1 or 2 child nodes:

 

create table top_bank_next_path
    distribute by replication
as
select substring(N.parent from 4) as parent
     ,P.depth
     ,replace(substring(N.node from (3 + length(N.parent))),']','') as next_node
     ,P.subpath_cnt as parent_cnt
     ,N.subpath_cnt as next_cnt
     ,(1.0 * N.subpath_cnt / P.subpath_cnt)::numeric(3,2) as probability
     ,row_number() over (partition by N.parent order by N.subpath_cnt desc) as rank
from bank_path_analyzer_output N
    ,bank_path_analyzer_output P
where P.depth > 0
and   N.parent = P.node
;

 

 

parentdepthnext_nodeparent_cntnext_cntprobabilityrank
ACCOUNT SUMMARY, ACCOUNT SUMMARY, FAQ, ONLINE STATEMENT ENROLLMENT4VIEW DEPOSIT DETAILS111111
ACCOUNT SUMMARY, FUNDS TRANSFER, FUNDS TRANSFER, FUNDS TRANSFER4ACCOUNT HISTORY32180.561
ACCOUNT SUMMARY, FUNDS TRANSFER, FUNDS TRANSFER, FUNDS TRANSFER4ACCOUNT SUMMARY32140.442
ACCOUNT SUMMARY, VIEW DEPOSIT DETAILS, FUNDS TRANSFER, ACCOUNT HISTORY4FUNDS TRANSFER48190.41
ACCOUNT SUMMARY, FAQ, FAQ, FAQ4ACCOUNT SUMMARY58220.381

 

And finally, drum roll, here are some answers to the question "what is my customer likely to do next based on where they are in their journey".  In this next query, I'm using some parameters like 'probability greater than 30%', 'more than 20 similar paths have occurred' and perhaps most importantly, 'where rank equals 1', which means its the mostly likely next step for each parent path.

 

select parent,next_node
     ,parent_cnt,probability
from demo.top_bank_next_path
where rank = 1
and probability > .3
and parent_cnt > 20
order by probability desc
limit 10
;

 

parentnext_nodeparent_cntprobability
ACCOUNT SUMMARY, BILL MANAGER FORMBILL MANAGER ENROLLMENT331.0
ACCOUNT SUMMARY, FUNDS TRANSFER, CUSTOMER SUPPORTFAQ290.59
ACCOUNT SUMMARY, FUNDS TRANSFER, FUNDS TRANSFER, FUNDS TRANSFERACCOUNT HISTORY320.56
ACCOUNT SUMMARY, ACCOUNT SUMMARY, ONLINE STATEMENT ENROLLMENTFUNDS TRANSFER270.56
ACCOUNT SUMMARY, CUSTOMER SUPPORT, ONLINE STATEMENT ENROLLMENTACCOUNT HISTORY270.56
ACCOUNT SUMMARY, PROFILE UPDATE, CUSTOMER SUPPORTACCOUNT HISTORY270.56
ACCOUNT SUMMARY, FUNDS TRANSFER, ONLINE STATEMENT ENROLLMENTACCOUNT HISTORY240.54
ACCOUNT SUMMARY, FAQ, CUSTOMER SUPPORTACCOUNT HISTORY430.42
ACCOUNT SUMMARY, VIEW DEPOSIT DETAILS, FUNDS TRANSFER, ACCOUNT HISTORYFUNDS TRANSFER480.4
ACCOUNT SUMMARY, PROFILE UPDATE, FUNDS TRANSFERFUNDS TRANSFER640.39

 

There it is, a pretty straight forward, and I think a very powerful new step to add to all your Aster nPath workflows to create a very useful new table to answer many question from your pathing data.

Enjoy!

Outcomes