Forum Replies Created

Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
    Posts
  • Customer Inquiry
    Participant
    Post count: 201

    It’s important to understand that the RFC which specifies how NAPTR/SRV records are implemented is rather vague in regards to how they are used (see https://datatracker.ietf.org/doc/html/rfc2782). The algorithm which outlines how the UA should perform the lookup (see page 7) does not specify when the priority list is created, nor when the UA should restart at the top of the priority list. The result of this ambiguity is very different behaviors on the part of UA clients in regards to how they respond to a change in availability of a SIP server. To make matters worse, each different implementation may in fact be fully compliant with the RFC (and manufacturers will all claim their way is the right way).

    From a practical standpoint, some UA clients do an excellent job participating in the failover of the active SIP node, while others do not. For example, SNOM phones (as of firmware released in 2013) select the SRV record to use in priority order, but starting at the current priority. The result is that SNOM phones are immediately responsive, and in the event of a failover they quickly detect the active SIP stack and future calls/registrations are once again immediately responsive. SNOM is fully complaint with RFC 2782.

    In comparison, Panasonic phones will iterate through the list of SRV records in order of priority upon every registration/call. Every time they do so they start at the lowest priority and move upwards. So unless the first SRV record is currently in use, it can take 30-45 seconds per SRV record to find the active SIP stack, for every call/registration. Companies using Panasonic phones discover that in the case of a failover it takes 30 sec-4 min per call for the phone to connect to the SIP server. As noted above, Panasonic is fully compliant with RFC 2782.

    As manufacturers get more real-world HA experience they tend to use SRV records differently. For example, manufacturers like Cisco changed their SRV lookup behavior to be more like SNOM. As of firmware 8.5(3), Cisco 7941/61, 7942/62, 7945/65, and 7906 phones stop attempting to reach the SIP hosts in priority order starting at the lowest on every call/registration, they now (as of that firmware release) check SRV records starting with last priority used. For this reason selected Cisco and SNOM phones work very well in failover scenarios.

    If you are designing a high availability telephony environment you must consider the behavior of the UA clients. If you control the make/model/firmware/configuration of each UA client, then you may be able to use SRV records for active SIP node contact. And, NAPTR/SRV records are the preferred method of locating the active SIP node. However, if you can’t control the UA clients in use or the UA client chosen has a poor implementation of NAPTR/SRV records then you should switch to allowing HAast to control the location of the active node. This can be done through HAast event handlers updating DNS records, changing routes, modify firewall configurations, etc.

    Telium has implemented a wide variety of solutions across a large range of hardware/software platforms. We would be pleased to design a solution which fits your needs, and to implement it as well. Please note that we cannot rate, recommend, or discourage use of a particular phone / UA client for legal reasons. The comments above reflect our experience with particular phones at a point in time, and these comments do not constitute and endorsement of any particular product nor a criticism of any other product. Since each phone may be fully RFC 2782 compliant we are not saying that any manufacturer’s implementation is wrong or poor, rather we are saying that certain phones / UA’s work better in certain scenarios.

    Customer Inquiry
    Participant
    Post count: 201
    in reply to: Constant failover #6857

    Your replacement systemd file solved my problem. I found discussions on the Digium forum about the parameters (used in the Digium provided service file) and causing slow Asterisk start. I also found this link https://community.asterisk.org/t/solved-centos-7-compatible-init-d-or-systemd-script-for-asterisk-13/66359/2 which makes reference to the same.

    I noticed your recommended SystemD service file also removes the restart parameter, which I realize makes perfect sense since HAAst should be controlling starts/stops of Asterisk, not systemd.

    Customer Inquiry
    Participant
    Post count: 201

    Is it possible to implement the above without the B2BUA on either side? If there’s no SIP traffic won’t the RTP channel stay up?

    Customer Inquiry
    Participant
    Post count: 201

    That person remembers receiving the package but is not sure where they put it. I guess we misplaced it in the office. What happens now?

    Customer Inquiry
    Participant
    Post count: 201

    I changed the network card to a faster one. Why would HAAst care? Do you really need this activation stuff?

    Customer Inquiry
    Participant
    Post count: 201

    We had a similar situation where both peers were handling calls (when our two data centers lost contact). Upon cluster re-assembly HAAst chose to demote the peer with more active calls; isn’t that backwards?

Viewing 6 posts - 1 through 6 (of 6 total)