Home › Forums › HAast (High Availability for Asterisk) › General › Keeping calls up when cluster switches to backup node
-
AuthorPosts
-
We are designing a high availability solution for a customer that needs to keep calls up in the event of a cluster failover situation.
We have considered another product (which I believe you reference here https://telium.io/topic/patented-call-survival-add-on/ ) and we have concluded that it is not suitable. We need proper detection of node degradation, route degredation, call bridge failure, etc. and their solution doesn’t do any of that. (Health monitoring is trivial). As well we need synchronization of databases, files, etc. between nodes and again their solution is very limited (synchronization is trivial – if you can even call it synchronization). So we want to work with HAAst and are trying to come up with a solution.
Our environment is straight forward:
- All calls originate (or terminate) at the trunk (SIP endpoint on the left)
- All phones terminate (or originate) calls (SIP endpoint on the right)
- The HAAst cluster sits in the middle. (both cluster nodes are located in the same data center)
SIP Endpoint < > Cluster < > Sip Endpoint
We read your post (https://telium.io/topic/call-continuity-survival-on-failover/) about involving the ITSP and we agree with the benefits of doing so. We’re just wondering if there is another way to keep calls up? (We don’t qualify for the OEM edition). We also understand and agree with your recommendation not to introduce a single point of failure in front of the cluster, but let suppose we accept those risks. Is there a solution?
If you are willing to accept the risks of placing new single points of failure in your call path, and you are not using the OEM edition of HAAst (which includes call survival features), then yes you still have options. The key to this solution is to ensure directmedia (RTP flowing directly between endpoints). It’s also quite likely that your endpoints will expect to see the SIP channel responsive as well (or they may drop the call).
Establishing directmedia involves:
- Ensuring the media anchor points are accessible to one another without NAT.
- Ensuring Asterisk is configured to use re-invites/directmedia
- Ensuring your Asterisk dialplan does not force Asterisk to remain in the RTP stream
- Ensuring your endpoints do not require transcoding (performed by Asterisk)
Optional: ensuring the SIP endpoints continue to see active SIP connections involves:
- Placing a B2BUA (or gateway/proxy/SBC) between endpoints and the cluster – this device must place itself into the SIP stream and optionally allow NAT traversal
- Configuring the B2BUA to allow the interior leg of the SIP call to drop, but keep the outer leg of the SIP call to remain active
- Configuring the B2BUA to use UDP for SIP (at least for cluster facing leg). This is not always required
For example (this shows two B2BUA’s for clarity, but you can adjust to fit your need):
There are open source B2BUA products which might be modifiable to do what you want (eg: the SIPpy project available at: https://github.com/sippy/b2bua). Keep in mind that you are creating a free version of the commercial solution we do not recommend. If this is a critical call center you may be better off developing a proper B2BUA from scratch to do what you want, including moving calls through the new active HAAst node, etc but that is a large undertaking.
HAAst OEM edition creates a call anchor on the PBX, so that even if Asterisk fails the calls don’t drop. HAAst will move the calls to the other node in an orderly fashion (move by IP or SIP redirect), or HAAst will grab the calls by force should the entire PBX server fail.
Is it possible to implement the above without the B2BUA on either side? If there’s no SIP traffic won’t the RTP channel stay up?
This may be possible but it really depends on:
- If the endpoints use TCP for the SIP connection then they may detect the cluster failover (as the TCP connection closes, possibly with a FIN)
- If the endpoints generate SIP traffic (eg: reregistering) then the lack of response or out of sequence response may cause the endpoint to terminate RTP connections
Telium does not provide assistance for creating this type of configuration – but we know clients have made this work. Telium only endorses the method used by HAAst OEM edition, as this is the only proper means for call continuity.
-
AuthorPosts
- You must be logged in to reply to this topic.