Components monitoring

SentiOne Automate monitoring is based on checking status of each system component. Each component is check to verify if it is ready to provide it's services. It's checked periodically by specified time interval and tested if is responding correctly.
For these checks, base Kubernetes functions are used:

  • readiness probes - If readiness probe tests fails (returned result is missing), Kubernetes interprets component as not ready and retries the test after specified time. Only when the result is correct, component is considered as ready to serve the traffic.
  • liveness probes - are pericardial operations performed on a component, which allow to determine whether the tested component is working properly. If test fail, Kubernetes is being informed about the need to restart that application to ensure continuity of operation of the entire IT system

In case of the SentiOne Automate system, we will use two types of tests (in this case: pods):

  • HTTP request - simple GET request is sent to the specific endpoint, configured by application endpoint. Response from that endpoint is being interpreted (codes greater of equal to 200 and less than 400 are interpreted as success, any other code means that component is not working properly)
  • TCP probe - check which verifies if the application opened the specified TCP port (if it is open then check succeeds, otherwise component is considered down)

Popular monitoring systems (eg. Nagios, Sensu, Prometheus) allows preparing custom scripts for monitoring. These could be very simple bash scripts that inform about component health by exit code.

Standard exit codes have following interpretation

Pod monitoring

Each of SentiOne Automate system components exposes an appropriate endpoint to which HTTP GET request should be sent and based on response should allow an interpretation of the application state. In below table the are all components with short description of exposed port, responses etc.
Popular monitoring systems (eg. Nagios, Sensu, Prometheus) allows preparing custom scripts for monitoring. These could be very simple bash scripts that inform about component health by exit code.

Standard exit codes have following interpretation

Exit codeMeaning
0OK - Works fine
1WARN - Warning status
2CRIT - Critical status

Following list of components contains also results interpretation that should be used for writing your own monitoring scripts.

Readiness / Liveness configuration

All settings but the initial delay should be common for all Automate applications (excluding NLU part). Here are the default parameters:

ParameterInitial delayPeriodTimeoutFailure threshold
readiness5s10s5s4
liveness60s20s5s4

Some applications got longer starting times, so for these applications, we need to increase both initial delays by the application's starting time. See table below

ApplicationStarting time
new-web30s

Note:
Some applications (e.g. analyser) start loading models after receiving the first HTTP request. Therefore, these applications could throw couple of readiness warnings. Nothing to worry about provided they load after 4-6 of those warnings.

admin

EndpointDefault TCP portResult meaning
/healthCheck5750/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://admin:5750/healthCheck

dialogs

EndpointDefault TCP portResult meaning
/healthCheck5748/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://dialogs:5748/healthCheck

gateway

EndpointDefault TCP portResult meaning
/healthCheck5000/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://gateway:5000/healthCheck

nlu-facade

EndpointDefault TCP portResult meaning
/healthCheck5750/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://nlu-facade:5750/healthCheck

nlu-pipeline

EndpointDefault TCP portResult meaning
/healthCheck8080/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://nlu-pipeline:8080/healthCheck

web-chat

EndpointDefault TCP portResult meaning
/healthCheck5760/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://web-chat:5760/healthCheck

cron-orchestrator

EndpointDefault TCP portResult meaning
/healthCheck5758/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://cron-orchestrator:5758/healthCheck

twitter-bot

EndpointDefault TCP portResult meaning
/healthCheck5756/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://twitter-bot:5756/healthCheck

facebook-bot

EndpointDefault TCP portResult meaning
/healthCheck5760/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://facebook-bot:5760/healthCheck

whatsapp-bot

EndpointDefault TCP portResult meaning
/healthCheck5752/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://whatsapp-bot:5752/healthCheck

skype-bot

🚧

TCP port can be changed

TCP Port of healthCheck endpoint can be changed with following configuration keys

chatbots.skype-bot.http-app-status.host
chatbots.skype-bot.http-app-status.port

EndpointDefault TCP portResult meaning
/healthCheck8392/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://skype-bot:8392/healthCheck

ms-teams-bot

EndpointDefault TCP portResult meaning
/healthCheck5770/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://ms-teams-bot:5770/healthCheck

sso

EndpointDefault TCP portResult meaning
/healthCheck9000/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://sso:9000/healthCheck

thread-coordinator

EndpointDefault TCP portResult meaning
/healthCheck5762/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://thread-coordinator:5762/healthCheck

sentiduck

EndpointDefault TCP portResult meaning
/healthCheck2012/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://sentiduck:2012/healthCheck

duckling

🚧

Duckling service is dependant of sentiduck service. To monitor it's health you have to use healthCheck endpoint of sentiduck.

Sample curl

curl -XGET http://sentiduck:2012/healthCheck
{
   "status":"ERROR",
   (...)
   "dependency_status":{
    "status":"ERROR",
    "msg":"(...)"
   }
}

greetings-detector

EndpointDefault TCP portResult meaning
/healthCheck2012/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://greetings-detector:2012/healthCheck

inferrer

EndpointDefault TCP portResult meaning
/healthCheck12416/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://inferrer:12416/healthCheck

intentizer-multi

EndpointDefault TCP portResult meaning
/healthCheck6543/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://intentizer-multi:6543/healthCheck

keywords

EndpointDefault TCP portResult meaning
/healthCheck11234/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://keywords:11234/healthCheck

name-service

EndpointDefault TCP portResult meaning
/healthCheck3456/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://name-service:3456/healthCheck

ner-pl

EndpointDefault TCP portResult meaning
/healthCheck5000/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://ner-pl:5000/healthCheck

tf-serving

Default TCP port: 8500

🚧

tf-serving service is dependant of ner-pl service. To monitor it's health you have to use healthCheck endpoint of ner-pl component.

Sample curl

curl -XGET http://ner-pl:5000/healthCheck

Sample error response

{
   "status": "ERROR",
   (...)
   "dependency_status": {
     "status": "ERROR",
     "msg": (...)
  }
}

pcre

EndpointDefault TCP portResult meaning
/healthCheck5000/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://pcre:5000/healthCheck

tagger-pl

EndpointDefault TCP portResult meaning
/healthCheck9003/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://tagger-pl:9003/healthCheck

pattern

EndpointDefault TCP portResult meaning
/healthCheck5000/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://pattern:5000/healthCheck

new-web

EndpointDefault TCP portResult meaning
/healthCheck9000/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://new-web:9000/healthCheck

analyser

EndpointDefault TCP portResult meaning
/healthCheck7080/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://analyser:7080/healthCheck

bot-integration

EndpointDefault TCP portResult meaning
/healthCheck9010/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://bot-integration:9010/healthCheck

slim-uploader

EndpointDefault TCP portResult meaning
/healthCheck8765/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://slim-uploader:8765/healthCheck

refinery

EndpointDefault TCP portResult meaning
/healthCheck8765/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200

Sample curl

curl -XGET http://refinery:8765/healthCheck

hooks-server

EndpointDefault TCP portResult meaning
/healthCheck8069/TCP0 (OK) - if the HTTP status code is equal to 200
2 (CRIT) - if the HTTP status is not equal 200
curl -XGET http://hooks-server:8069/healthCheck