Alerts (Events vs. Alerts): An event is sent from BioUptime agent (client) to BioUptime server. Based on incoming events (or absence of them) an alert is generated. Event and alerts are both shown in the BioUptime UI; alerts are also sent via email by BioUptime server. An alert and an event can have the same name and short description.

BioUptime is based on an advanced rule-driven and configurable alerting system: State-based alerts. An alert is triggered when an event is changed to an another event (occurrence of two consecutive events; state change). An state-based alert indicate whether an Element/Unit is working or not.

Examples:
  • When a BioUptime agent is started it should send a ServiceStarted event to notify the BioUptime server that it should expect to receive system events in a timely manner
  • When a BioUptime agent is shutting down it should send a ServiceStopped event to notify the BioUptime server that it should not expect to receive any more system events in a timely manner until a new ServiceStarted is sent
  • BioUptime agent should regularly send event HeartbeatOK to notify the BioUptime server that it is working as expected. If it is not working as expected it should send a HeartbeatNOK

List of alert states

Known bad states:

  • ServiceStarted-HeartbeatNOK: bad state entered, perhaps the bad state was there already and alert was sent, check alert table. If none present, generate alert, send alert message
  • HeartbeatNOK-HeartbeatNOK: still in bad state, alert should have been sent, if not send it
  • HeartbeatOK-HeartbeatNOK: bad state entered from previously known good state, generate alert, send message
  • HeartbeatNOK-ServiceStopped: still in bad state, shutting down

Known bad state resolved:

  • HeartbeatNOK-HeartbeatOK: good state entered, problem is resolved, send message to subscribers, tag alert as resolved

Unknown states*:

  • ServiceStopped-ServiceStarted: starting up, unknown error state, look at alert table / previous-previous event
  • ServiceStarted-ServiceStopped: stopped right after started, should not happen, unknown state

* BioUptime Alerter cannot know in what type of state the agent/unit is in

Heartbeat failure states:

  • ServiceStarted-Absence of event: likely an agent problem 
  • HeartbeatOK-Absence of event: unit/agent used to work before but does not anymore
  • HeartbeatNOK-Absence of event: an alert has most likely been already generated
  • ServiceStopped-Absence of event: for some agents/units it may be the case that they should never be turned off. Hence, if you receive a ServiceStopped and then 15 minutes later no further events a PowerOff alert should be generated, notifying you that the unit (or at least it’s agent has been turned off)

Other alert states:

  • ServiceNotResponding
  • WebsiteNotResponding
  • SensorNotResponding
  • SystemOff
  • AgentUnknown

Normal operation states:

  • ServiceStarted-HeartbeatOK: just started up
  • HeartbeatOK-HeartbeatOK: continuos operation
  • HeartbeatOK-ServiceStopped: shutting down
  • ServiceStarted-Absence of event: system is off and no events are received

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s