Monitoring and troubleshooting

Monitoring of SentiOne Automate is based on checking state of every component in the system. We check if the component is ready to serve and we test if it is still running fine in periodic fashion. We use for that core Kubernetes mechanisms called:

  • readiness probes
  • liveness probes

If readiness probe does not return successful result, Kubernetes marks the component as not ready for serving. It keeps checking the probe until it returns successful result. Then it is marked as ready for serving and the traffic is routed to it.

The liveness probes however are periodical checks on already running service. They can help asses if the component is still running correctly. In case of failure Kubernetes will restart such component in order to keep uninterrupted service of entire system.

In case of SentiOne Automate we use two types of tests (in case of pods):

  • HTTP request - We send GET request to predefined endpoint and interpret the result by checking HTTP status code (Codes above or equal 200 but less than 400 are considered successful, other codes are interpreted as failure)
  • TCP probe - checking if application listen on predefined TCP port (if the port is open, then it's a success, otherwise we consider it a failure)