Lost AMI connection caused failover

Lost AMI connection caused failoverCustomer Inquiry2018-04-05T13:17:35-05:00

Viewing 2 posts - 1 through 2 (of 2 total)

Author

Posts
Customer Inquiry
Participant

April 5, 2018 at 1:17 pm

Post count: 210

#6528

This morning I found my cluster had failed over to the backup data center (during the night). Around that time I see an error in the haast log that the remote peer lost connection to the AMI. But, I can confirm that Asterisk was still running.

I sent you my Asterisk logs and HAAst logs by email. Can you confirm what caused the failover?

Telium Support Group
Participant

April 5, 2018 at 1:22 pm

Post count: 270

#6759

Based on the Asterisk full message log received, it appears that your Asterisk process was hung for almost 30 seconds. As proof, you have a number of plug-ins/dialplan add-ons that trigger log messages at least once per second. Notice that at 2:38am all messages stopped for almost 30 seconds? Something was blocking IO/CPU to the Asterisk process.

Five seconds after the Asterisk process hung HAAst deemed the peer to be non-responsive and initiated a failover. (This is correct behavior on the part of HAAst – something was going wrong on your PBX).

You need to trace down the root cause of Asterisk hanging for almost 30 seconds. Look for badly written backup scripts, IO or CPU intensive jobs scheduled for this time, etc. Do a grep search through all of your system logs around that time for clues as to what else was happening on your system.

(Hint: Looking at your Asterisk log file you appear to have added a new plug-in in the last 2 days)
Author

Posts

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.