Advisory -- 2016.09u9-u11 vov_diagnostics_no_start

The version of the file
  $VOVDIR/scripts/vov_diagnostics_no_start

that is shipped with 2016.09u9 and later contains a stanza that sends a
query to vovserver for other jobs running on the same vovslave.
This requires a CPU-intensive scan of all the jobs in the system.

If vovserver is under stressed, this added load forms a positive-feedback loop
that adds more load, possibly triggering more instances of vov_diagnostics_no_start

Runtime R&D is working on a patch

As a workaround, you can comment out the stanza that looks like this,
or duplicate the 'exit' line as shown below:

#!/bin/sh
# Script to grab information for subslave startup issues
# For use with VOV_DEBUG_NO_START
# This expects to receive 2 parameters
#  1 jobid   2 subslave pid

thisprog=vov_diagnostics_no_start

usage () {
    cat <<End-Usage-Info
Usage: $thisprog nc-jobid subslave-pid

Gather info for troubleshooting NC subslave startup problems
Info is printed to the vovslave log

End-Usage-Info
    exit 2
}

# body of script omitted

exit $nerr

#if [ "x$VOV_SLAVE_NAME" != "x" ]; then
#    $ECHO "# jobs running on this slave: "
#    vovselect id,user,isinteractive,duration,prop.ALLPIDS from jobs where #slavename==$VOV_SLAVE_NAME
#fi

# Skip this for now
# $ECHO "# environment of the affected job"
# nc info -e $jobid

exit $nerr

 

Did you find this article helpful?