wiki:SystemMonitorConfiguration

Version 1 (modified by martin, 16 years ago) (diff)

--

System Monitoring

Our system monitoring is based on configuring the monitor tool. The current draft of this configuration is below.

The initial configuration is based on just a few states:

START::

the standard initial state, used to perform priming reads of status and to setup operating defaults

UNKNOWN::

the standard state used when no other state fits with the current conditions

startingNetwork::

we do not have an ethernet address

running::

all is ok, our process pid file exists and contains a process id and the ethernet has an IP number in the range we expect

FXOhang::

we have detected a hung state from our FXO modules. In this state, we attempt to reset the devices.

dead::

used when we are in the UNKNOWN state for too long

#   Private and confidential.
#
#   Copyright Jazmin Communications Pty Ltd, 2009
#   All rights reserved.
#
#   Not for external release.
#
# A monitor configuration for the ip04 system
#
# 

ENTER START {
	LOG "starting monitor"
	SET CYCLE = 2  # monitor aggressively while booting
	SET enetStatus = RUN "/sbin/ifconfig en1"
}

STATE startingNetwork {
	enetStatus NOT ~ /inet[ ]+192.168/
}

ENTER startingNetwork {
	SET CYCLE = 2  # monitor aggressively while waiting for the network to come back
}

POLL startingNetwork {
	SET enetStatus = RUN "/sbin/ifconfig en1"
}

# we define operational states based on various conditional tests
# if all conditions pass, the monitor enters the given state and runs 
# our 'ENTER' method.

# the following reads a .pid file and verifies that it contains a number. 
# note that we demonstrate the 'COLLECT' verb here. COLLECT can be used
# to collect data into a variable for later tests. This is largely for
# optimisation.
# 	The other way to specify the condition for this state is simply:
#
#   FILE 'myproc.pid' ~ /[0-9]+/
#
STATE running {
	enetStatus ~ /inet[ ]+192.168/;
	COLLECT myprocpid FROM FILE '"/tmp/myproc.pid"';
	myprocpid ~ /[0-9]+/ # note: current bug, the file name cannot contain a path
}

# if a pid file was found, we log the fact.  Every 'CYCLE' seconds, we will test that the
# system is still running.
ENTER running {
	LOG "running ok"
	SET CYCLE = 5  # less frequent monitoring while things are running nicely
}

POLL running {
	SET fxostatus = RUN "'/Users/martin/Desktop/current/Jazmin Communications/check_installed_FXO_status'"
	SET enetStatus = RUN "/sbin/ifconfig en1"
}

# if no states match, the monitor automatically enters the state 'UNKNOWN' 
# we can catch this by setting up an enter method:

ENTER UNKNOWN {
	LOG "unknown system state"
}

STATE FXOhang {
	fxostatus ~ /0xff/
}

ENTER FXOhang {
	LOG "detected hang in FXO module"
	SET fxostatus = RUN "'/Users/martin/Desktop/current/Jazmin Communications/reset_FXO'"
}

# if we do not enter the 'running' state, our monitor will enter the UNKNOWN
# state because we have not setup any conditions for any other states.
# If we have been in the UNKNOWN state for 10 seconds or more, we give up and 
# decide the system is dead.

STATE dead {
	CURRENT ~ /UNKNOWN/;
	TIMER >= 4
}

# in this sample, if the system is dead, we simply log the fact and exit.

ENTER dead {
	LOG "program is not running, restarting monitor";
	SPAWN "/bin/date >>/tmp/dates"
}

Trac Appliance - Powered by TurnKey Linux