Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • Avatar photoCustomer Inquiry
    Participant
    Post count: 203

    This morning I found my cluster had failed over to the backup data center (during the night). Around that time I see an error in the haast log that the remote peer lost connection to the AMI. But, I can confirm that Asterisk was still running.

    I sent you my Asterisk logs and HAAst logs by email. Can you confirm what caused the failover?

    Avatar photoTelium Support Group
    Participant
    Post count: 265

    Based on the Asterisk full message log received, it appears that your Asterisk process was hung for almost 30 seconds. As proof, you have a number of plug-ins/dialplan add-ons that trigger log messages at least once per second. Notice that at 2:38am all messages stopped for almost 30 seconds? Something was blocking IO/CPU to the Asterisk process.

    Five seconds after the Asterisk process hung HAAst deemed the peer to be non-responsive and initiated a failover. (This is correct behavior on the part of HAAst – something was going wrong on your PBX).

    You need to trace down the root cause of Asterisk hanging for almost 30 seconds. Look for badly written backup scripts, IO or CPU intensive jobs scheduled for this time, etc. Do a grep search through all of your system logs around that time for clues as to what else was happening on your system.

    (Hint: Looking at your Asterisk log file you appear to have added a new plug-in in the last 2 days)

Viewing 2 posts - 1 through 2 (of 2 total)
  • You must be logged in to reply to this topic.