D Brezinsky

Aster Execution Engine (AX) - restoring non-persistent objects

Blog Post created by D Brezinsky Champion on Apr 26, 2017

This post outlines an approach to automatically recreate non-persistent objects in an Aster Execution Engine (AX) environment on startup.

 

Background

The Aster Execution Engine (also known as Aster-on-Hadoop) exists as a collection of YARN-managed compute services without any persistent user data - all user data is held in temporary or analytic tables only as long as the AX instance is running, and is only persisted by storing it back into the underlying Hadoop instance or onto some other system (eg: load_to_teradata()).

 

In addition to user data, there are a number of database objects that do NOT persist between restarts of an AX instance. Per the 7.00.00.01 User Guide:

Persistent ObjectsNon-Persistent Objects
• users
• roles
• databases
• schemas
• foreign server definitions
• packaged analytics models and functions
• granted privileges on persistable objects
• tables
• views
• constraints
• indexes
• R scripts installed on the server side
• user-installed files and SQL/MR functions
• user scripts for vacuum or daily jobs

 

To make life easier for myself, my idea is to detect when the AX instance has been restarted, and to recreate the objects that I want to be there every time. These might be views into my data on the HDFS, custom functions and anything else that I use regularly.

 

An Example Aproach

To make this happen, you need 2 things:

  1. A way to detect that the AX instance has restarted
  2. A way to create the objects

 

I believe that "Indecision is the Basis of Flexibility", so the pieces I created are full of user-customizable components.

  • To detect that the instance has restarted, AX provides a token that changes every time the instance restarts. To read this token, use the query:
         select * from nc_system.nc_instance_token;
  • Keep the token in a file, and when it changes, you've got a restart; in which case, you simply run a bunch of stuff that you can mix-and-match as needed

 

A sample script that does this:

 

#! /bin/sh
UPSTARTDIR=/root/upstart                 # the directory of this script
TOKENFILE=${UPSTARTDIR}/.currtoken       # where I store a copy of the instance token
SCANINT=10                               # how often to scan for a change (in seconds)
UPSTARTPROCDIR=${UPSTARTDIR}/upstart.d   # a dir of scripts to run when the instance restarts

source /home/beehive/config/asterenv.sh

while true ; do
     OLDTOKEN="$(cat $TOKENFILE 2>/dev/null)"

     TOKEN="$(act -U beehive -w password -q -A -t -c "/* NOAMC */ select * from nc_system.nc_instance_token" 2>/dev/null)"

     # unable to get new token - aster instance is not available
     [ "${TOKEN}" ] || {
          echo "### UNABLE TO GET TOKEN FROM ASTER SERVICE INSTANCE. WAITING..."
          sleep ${SCANINT}
          continue
     }

     # tokens do not mismatch (tokens are the same)
     [ "${TOKEN}" != "${OLDTOKEN}" ] || {
          sleep ${SCANINT}
          continue
     }

     echo "=== TOKENS HAVE CHANGED. ASTER SERVICE INSTANCE HAS RESTARTED"

     echo ">>> running executables from ${UPSTARTPROCDIR}"
     for F in $(ls ${UPSTARTPROCDIR} 2>/dev/null) ; do
          [ -x ${F} ] || {
               echo "--- ${F} is not executable. skipping..."
          }
          echo ">>> Running startup file ${F}"
          ${UPSTARTPROCDIR}/${F} 2>&1 | sed -e 's/^/ | /'
     done

     echo "=== STORING CURRENT TOKEN TO ${TOKENFILE}"
     echo "${TOKEN}" > ${TOKENFILE}

     sleep ${SCANINT}
done

Notes:

  • The script could be run in the system's inittab, as a startup process, or in any way that seems fitting. It does NOT need to run as root, but things you want it to do may need certain privileges.
  • You may want to make sure that there is only one copy of this script running.
  • The phrase '/* NOAMC */' prevents this query from showing up in the Aster Management Console process history. Use this with discretion.

When the script detects that the AX instance has restarted, it will look in the directory ${UPSTARTPROCDIR} and simply run any file there that is executable, in 'ls' order. I made it modular, so I can mix-and-match components, and made it run only the executables so I can turn off pieces just by toggling the execute bit.

These scripts may do anything you like on restart, using act, ncli or even other things in the environment. For example, I might want to maintain a set of views into Hadoop data sets; given a script that creates these views (create_sql_views.sql), a setup script 10-create_demo.sh might look like this: 

#!/bin/sh
. /home/beehive/config/asterenv.sh
act -U db_superuser -w db_superuser -d playbook \
-f /asterfs/poc_playbook/poc_scripts_kb/sqlh/create_sqlh_views.sql

 

There are plenty of ways to achieve this result. I hope this one proves helpful.

Outcomes