Components monitoring
SentiOne Automate monitoring is based on checking status of each system component. Each component is check to verify if it is ready to provide it's services. It's checked periodically by specified time interval and tested if is responding correctly.
For these checks, base Kubernetes functions are used:
- readiness probes - If readiness probe tests fails (returned result is missing), Kubernetes interprets component as not ready and retries the test after specified time. Only when the result is correct, component is considered as ready to serve the traffic.
- liveness probes - are pericardial operations performed on a component, which allow to determine whether the tested component is working properly. If test fail, Kubernetes is being informed about the need to restart that application to ensure continuity of operation of the entire IT system
In case of the SentiOne Automate system, we will use two types of tests (in this case: pods):
- HTTP request - simple GET request is sent to the specific endpoint, configured by application endpoint. Response from that endpoint is being interpreted (codes greater of equal to 200 and less than 400 are interpreted as success, any other code means that component is not working properly)
- TCP probe - check which verifies if the application opened the specified TCP port (if it is open then check succeeds, otherwise component is considered down)
Popular monitoring systems (eg. Nagios, Sensu, Prometheus) allows preparing custom scripts for monitoring. These could be very simple bash scripts that inform about component health by exit code.
Standard exit codes have following interpretation
Pod monitoring
Each of SentiOne Automate system components exposes an appropriate endpoint to which HTTP GET request should be sent and based on response should allow an interpretation of the application state. In below table the are all components with short description of exposed port, responses etc.
Popular monitoring systems (eg. Nagios, Sensu, Prometheus) allows preparing custom scripts for monitoring. These could be very simple bash scripts that inform about component health by exit code.
Standard exit codes have following interpretation
Exit code | Meaning |
---|---|
0 | OK - Works fine |
1 | WARN - Warning status |
2 | CRIT - Critical status |
Following list of components contains also results interpretation that should be used for writing your own monitoring scripts.
Readiness / Liveness configuration
All settings but the initial delay should be common for all Automate applications (excluding NLU part). Here are the default parameters:
Parameter | Initial delay | Period | Timeout | Failure threshold |
---|---|---|---|---|
readiness | 5s | 10s | 5s | 4 |
liveness | 60s | 20s | 5s | 4 |
Some applications got longer starting times, so for these applications, we need to increase both initial delays by the application's starting time. See table below
Application | Starting time |
---|---|
new-web | 30s |
Note:
Some applications (e.g. analyser) start loading models after receiving the first HTTP request. Therefore, these applications could throw couple of readiness warnings. Nothing to worry about provided they load after 4-6 of those warnings.
admin
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 5750/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://admin:5750/healthCheck
dialogs
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 5748/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://dialogs:5748/healthCheck
gateway
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 5000/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://gateway:5000/healthCheck
nlu-facade
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 5750/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://nlu-facade:5750/healthCheck
nlu-pipeline
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 8080/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://nlu-pipeline:8080/healthCheck
web-chat
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 5760/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://web-chat:5760/healthCheck
cron-orchestrator
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 5758/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://cron-orchestrator:5758/healthCheck
twitter-bot
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 5756/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://twitter-bot:5756/healthCheck
facebook-bot
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 5760/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://facebook-bot:5760/healthCheck
whatsapp-bot
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 5752/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://whatsapp-bot:5752/healthCheck
skype-bot
TCP port can be changed
TCP Port of healthCheck endpoint can be changed with following configuration keys
chatbots.skype-bot.http-app-status.host
chatbots.skype-bot.http-app-status.port
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 8392/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://skype-bot:8392/healthCheck
ms-teams-bot
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 5770/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://ms-teams-bot:5770/healthCheck
sso
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 9000/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://sso:9000/healthCheck
thread-coordinator
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 5762/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://thread-coordinator:5762/healthCheck
sentiduck
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 2012/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://sentiduck:2012/healthCheck
duckling
Duckling service is dependant of sentiduck service. To monitor it's health you have to use healthCheck endpoint of sentiduck.
Sample curl
curl -XGET http://sentiduck:2012/healthCheck
{
"status":"ERROR",
(...)
"dependency_status":{
"status":"ERROR",
"msg":"(...)"
}
}
greetings-detector
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 2012/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://greetings-detector:2012/healthCheck
inferrer
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 12416/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://inferrer:12416/healthCheck
intentizer-multi
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 6543/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://intentizer-multi:6543/healthCheck
keywords
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 11234/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://keywords:11234/healthCheck
name-service
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 3456/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://name-service:3456/healthCheck
ner-pl
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 5000/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://ner-pl:5000/healthCheck
tf-serving
Default TCP port: 8500
tf-serving service is dependant of ner-pl service. To monitor it's health you have to use healthCheck endpoint of ner-pl component.
Sample curl
curl -XGET http://ner-pl:5000/healthCheck
Sample error response
{
"status": "ERROR",
(...)
"dependency_status": {
"status": "ERROR",
"msg": (...)
}
}
pcre
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 5000/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://pcre:5000/healthCheck
tagger-pl
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 9003/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://tagger-pl:9003/healthCheck
pattern
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 5000/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://pattern:5000/healthCheck
new-web
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 9000/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://new-web:9000/healthCheck
analyser
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 7080/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://analyser:7080/healthCheck
bot-integration
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 9010/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://bot-integration:9010/healthCheck
slim-uploader
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 8765/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://slim-uploader:8765/healthCheck
refinery
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 8765/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
Sample curl
curl -XGET http://refinery:8765/healthCheck
hooks-server
Endpoint | Default TCP port | Result meaning |
---|---|---|
/healthCheck | 8069/TCP | 0 (OK) - if the HTTP status code is equal to 200 2 (CRIT) - if the HTTP status is not equal 200 |
curl -XGET http://hooks-server:8069/healthCheck
Updated 16 days ago