RabbitMQ monitoring

Description

A queue system used for asynchronous communication

Critical service

YES

Monitoring

RabbitMQ offers CLI (command line interface) tools for monitoring. To check if RabbitMQ process is working fine you should issue following command on the server where it runs:

rabbitmq-diagnostics ping

Feedback is given with the exit code of the command above. You can get it's value from environment variable $? with following command

echo $?

Below you can find meaning of the exit codes

Exit code ($?)Meaning
0OK - Works fine
2CRIT - There are some issues with RabbitMQ

RabbitMQ itself opens following ports

PortPurpose
5672/TCPData exchange
15672/TCPWWW Administration panel

Both of these ports can be used for checking if the service is still running.

How to ensure high availability

Launching at least 3 instances of the application and connecting them into a cluster in accordance with the installation instructions

Effects of failure of all instances

No asynchronous communication between components:

  • slim uploader
  • refinery
  • bot-integration
  • admin
  • nlu-facade
  • nlu-pipeline
  • analytics
  • gateway

Effects of a single instance failure

None, the cluster automatically selects another instance as the master.The state of the queues is preserved

Disaster Recovery

No additional administrative actions are required after a node has connected to the cluster.

In the event of RabbitMQ data damage, it is necessary to restore the entire cluster, which unfortunately involves the loss of data in the queues. Such a reset should be performed when type errors appear in the logs of RabbitMQ itself:

[error] \<0.830.0> Error on AMQP connection \<0.830.0> (127.0.0.1:54910 -> 127.0.0.1:5672, vhost: 'none', user: 'guest', state: opening), channel 0:  
{handshake_error, opening, {amqp_error, internal_error, "access to vhost '/' refused for user 'some_configured_user': vhost '/' is down", 'connection.open'}}

and the commandrabbitmqctl restart_vhost results such an error

Failed to start vhost '/' on node 'rabbit@my-host'Reason: {:shutdown, {:failed_to_start_child, :rabbit_vhost_process, {:error, {{{:function_clause, \[{:rabbit_queue_index…

Then, execute the following commands to reset RabbitMQ data and configuration:

$ rabbitmqctl stop_app  
$ rabbitmqctl reset  
RabbitMQ system restart - depending on the operating system, this is the "service" or "systemctl" command>  
$ service rabbitmq-server restart  
$ rabbitmqctl start_app  
$ rabbitmqctl restart_vhost  
$ rabbitmqctl add_user my_user hasło  
$ rabbitmqctl set_permissions -p / my_user  "._" ".\_" ".\*"

The data for the user is in the config.yaml file, e.g. in the analyzer application or in secrets for the rabbit section. The above steps are analogous to RabbitMQ configuration during installation, so it is worth getting acquainted with the installation manual before starting work.


What’s Next