Forum Replies Created
-
AuthorPosts
-
If you are willing to accept the risks of placing new single points of failure in your call path, and you are not using the OEM edition of HAAst (which includes call survival features), then yes you still have options. The key to this solution is to ensure directmedia (RTP flowing directly between endpoints). It’s also quite likely that your endpoints will expect to see the SIP channel responsive as well (or they may drop the call).
Establishing directmedia involves:
- Ensuring the media anchor points are accessible to one another without NAT.
- Ensuring Asterisk is configured to use re-invites/directmedia
- Ensuring your Asterisk dialplan does not force Asterisk to remain in the RTP stream
- Ensuring your endpoints do not require transcoding (performed by Asterisk)
Optional: ensuring the SIP endpoints continue to see active SIP connections involves:
- Placing a B2BUA (or gateway/proxy/SBC) between endpoints and the cluster – this device must place itself into the SIP stream and optionally allow NAT traversal
- Configuring the B2BUA to allow the interior leg of the SIP call to drop, but keep the outer leg of the SIP call to remain active
- Configuring the B2BUA to use UDP for SIP (at least for cluster facing leg). This is not always required
For example (this shows two B2BUA’s for clarity, but you can adjust to fit your need):
There are open source B2BUA products which might be modifiable to do what you want (eg: the SIPpy project available at: https://github.com/sippy/b2bua). Keep in mind that you are creating a free version of the commercial solution we do not recommend. If this is a critical call center you may be better off developing a proper B2BUA from scratch to do what you want, including moving calls through the new active HAAst node, etc but that is a large undertaking.
HAAst OEM edition creates a call anchor on the PBX, so that even if Asterisk fails the calls don’t drop. HAAst will move the calls to the other node in an orderly fashion (move by IP or SIP redirect), or HAAst will grab the calls by force should the entire PBX server fail.
in reply to: Patented call survival add-on #6696Many call continuity solutions don’t work in a real-life Asterisk HA context. Most HA solutions attempt to maintain in progress calls by adding SIP devices in front of SIP devices. See this posting for a detailed discussion of how call survival can work with Asterisk/SIP: https://telium.io/topic/call-continuity-survival-on-failover/
There is one small company that claims to offer HA call survival (using their “patented” methodology). If you look closely at their patent (image reproduced below) you will see that all they have done is place a SIP proxy in front of each PBX, plus a “sip peer” in front of the SIP proxies:
So, will their ‘patented’ technology keeps calls up in case of:
- Failure of SIP proxy? No
- Failure of “SIP peer”? No
- Asterisk failing to bridge calls? No
- Power failure? No
- Loss of internet service? No
- Failure of router/firewall? No
- etc…
Making things even worse, placing a new single point of failure (the proxy, or “sip peer”) in front of an existing single point of failure (the PBX) doesn’t create HA – it just creates two single points of failure! You now have to create an HA solution for your SIP Proxy as well. This type of call “survival solution” solution only works in the most simplistic scenario, like the Asterisk box powering off and everything else continuing unaffected. Real life scenarios like trunk failures, power failures at the data center, firewall failures, etc. make such a call survival solution worthless.
Some companies attempt to keep RTP streams without SIP continuity; however, there are many reasons that SIP dialogs may occur while a call is in progress and if they don’t handle SIP then the RTP may drop. Many engineers new to SIP/RTP make this mistake – there are lots of details to consider here.
HAast OEM edition transitions calls between nodes without placing another device in front of either PBX. HAast integrates deeply with Asterisk to ensure calls don’t drop, queues re-populate, call recording resumes, etc. The issue is not as simple as just keeping an RTP stream up. HAast takes responsibility for transitioning the calls, rebuilding the Asterisk state, and transparently allow the peers to remain in full (SIP and RTP) contact.
in reply to: License for preproduction / testing server #6694Although it’s not listed on the website, we do offer a special Commercial Unlimited edition license to be used only for preproduction / testing environments. This license includes all functionality of the CU edition, but is limited to 5 simultaneous calls. This license is 1/2 the price of the regular CU edition, and optional maintenance agreement (for updates) is also 1/2 price. Please contact support@telium.io with your current customer number to purchase this license. (This version is only available to customers with at least 1 CU license).
If your customer wants to run a high volume of calls over the testing system then they would have to buy a full CU license. We don’t distinguish between testing and production systems if both need full features and full call volume. (We also have to prevent license fraud – so a full license has to be full price).
in reply to: Upgrade to configuration generator (FreePBX) #6693When you perform an upgrade/update to any module in FreePBX (even a minor one) there is the possibility that FreePBX will change the structure of the tables in MySQL. Since HAAst will (intentionally) not sync metadata (SQL structures), you must ensure that the peers do not attempt to synchronize data during such an upgrade/update.
The Maintenance and Operations Guide shows the complete upgrade procedure (see section 6). But if you are very experienced with Linux & FreePBX, you can follow this short-cut:
- Upgrade A
- Unplug the network connection from A
- Upgrade FreePBX on A
- Upgrade B
- Unplug the network connection from B
- Replug the network connection to A
- Upgrade FreePBX on B
- Re-establish cluster
- Replug the network connection from B
- Wait for the cluster to HAAst restablish automatically
- Use the telnet/web interface to make the preferred peer active. (Or wait for automatic fallback during the maintenance window if enabled in the haast.conf file)
The key concept here is that a standby peer must NOT be able to see an active peer which is running a different version (or different modules installed/enabled) of the configuration generator.
Note that this applies only to FreePBX. Other configuration generators do a much better job managing settings and keeping settings-code aligned.
in reply to: HAAst upgrade procedure (major version upgrade) #6692Since the Peerlink protocol verion has changed, the peers will not be able to talk to each other (over Peerlink) until both peers are upgraded. So if both peers are online at the same time they will not be able to communicate – and both peers will try to take over as active. Consequently, your upgrade procedure must ensure both peers are NOT online at the same time.
Since the license version has changed, you will need to request new licenses from Telium. To avoid bringing down the entire cluster you should upgrade and re-license one peer at a time.
The overall steps to performing such an upgrade are:
- Upgrade A
- Stop HAAst on peer A, wait for stop
- Run the install_files/updatefiles.sh script from the newly downloaded package
- Unplug the network connection from A
- Restart HAAst on peer A
- Request and apply new license to A if required, then restart A
- Switchover
- Stop HAAst on peer B, wait for stop
- Replug the network connection to A, ensure A takes over as active
- Upgrade B
- Run the install_files/updatefiles.sh script from the newly downloaded package
- Restart HAAst on peer B, ensure cluster forms
- Request and apply new license to B if required, then restart B
- Fallback to preferred peer (optional)
- Use the telnet/web interface to make the preferred peer active. (Or wait for automatic fallback during the maintenance window if enabled in the haast.conf file)
The answer depends on the location of your two PBX’s. If the two PBX’s are located on the same subnet, then
- Move IP: Use the VoIPNIC option of HAAst to move a single IP between peers. This will allow for rapid reconnection of downstream (user agents) and upstream (trunks)
If the two PBX’s are located on different subnets (from each other):
[list=2]
[*]SRV records: Assuming your user agents (phone sets) support SRV records (which most do), then you should create SRV records for your two PBX’s. Most user agents will perform a DNS lookup for SRV records to find available PBX’s, and try them in order of priority until they successfully register with a PBX. For example, if you have PBX’s located in data centers dc1 and dc2, create two DNS entries (in your internal DNS server) as follows:
type=srv
name=_sip._udp.mydomain.com
priority= 10
weight=0
port=5060
hostname=pbx1.local
and
type=srv
name=_sip._udp.mydomain.com
priority= 20
weight=0
port=5060
hostname=pbx2.local
[*]Route Change: Use the pre/post Asterisk start/stop event handlers of HAAst to update routes in your router(s). Set the updated routes to point to the new PBX address.
[*]DNS update: Use the pre/post asterisk start/stop event handlers to update a public DNS entry. Be sure to set the TTL value low enough that phones will lookup the new IP in a reasonable timeframe.[/list]Note: Using SRV records or DNS entries makes it easy for users with softphones to move on and off LAN and resume a PBX connection without manual intervention.
The answer depends on the location of your two PBX’s. If the PBX’s are located in the same data center (i.e. using the same external IP address), then no change is necessary as they will connect to the same IP address. If you need to modify your firewall/router internally to direct traffic to the active peer then see the answer to the question on locating the PBX for internal phones. On the other hand, if the PBX’s are located in different data centers (i.e. accessible using different public IP addresses) then your options are:
- SRV records: Assuming your user agents (phone sets) support SRV records (which most do), then you should create SRV records for your two PBX’s. Most user agents will perform a DNS lookup for SRV records to find available PBX’s, and try them in order of priority until they successfully register with a PBX. For example, if you have PBX’s located in data centers dc1 and dc2, then create to DNS entries (in your public DNS server) as follows:
type=srv
name=_sip._udp.mydomain.com
priority= 10
weight=0
port=5060
hostname=dc1.mydomain.com
and
type=srv
name=_sip._udp.mydomain.com
priority= 20
weight=0
port=5060
hostname=dc2.mydomain.com - DNS update: Use the pre/post asterisk start/stop event handlers to update a public DNS entry. Be sure to set the TTL value low enough that phones will lookup the new IP in a reasonable timeframe.
- MPLS: If you use MPLS then you can simply move the label (to move IP between routers of your two DC’s). We don’t provide any further detail on this option (i.e. if you don’t understand how to do this with MPLS, then there’s too much to explain in one post)
Note: Using SRV records or DNS entries makes it easy for users with softphones to move on and off LAN and resume a PBX connection without manual intervention.
in reply to: Asterisk 14 compatability #6689Yes, HAAst was certified Asterisk 14 compatible in November 2016. Not a lot of companies are running Asterisk 14 yet (in production) as of Jan 2017, so you will be on the leading edge. But I assume you need some Asterisk 14 features.
You didn’t mention your customer name/number but based on the username (and 4500 phone sets) I think I found you in our CRM system. It looks like your maintenance agreement expired last year so you will need to contact admin@telium.io to upgrade your license (otherwise upgrading HAAst will cause it to run as the ‘free edition’).
in reply to: OK to use rsync or NFS share for cluster data #6688You are welcome to use rsync,NFS,samba, etc. in your cluster. However, we generally recommend keeping data on each peer and allowing HAAst to control all synchronization, and here’s why:
- HAAst only synchronizes data between peers if peers have passed a health check. That means if one node is failing and starts to accidentally corrupt data, it will not be copied to the other peer! Tools like rsync, NFS, DRBD, etc. will immediately share/mirror all data including, corrupt data.
- By allowing HAAst to control synchronization, the HAAst event handler system will allow you to customize inbound data following a synchronization (e.g. update trunk information, modify the dialplan, customize TFTP files for the local network, etc)
You should not place databases on any block level sharing device (NFS/SAMBA), or do block level mirroring (DRBD,iSCSI), as corruption by one peer will destroy the database for the other peer! Even worse, a failure midway through a write will corrupt both peers! Note that HAAst performs SQL transactions (not block level access) for database synchronization, so even if a peer fails midway through a database write neither peer will be left with an invalid database state.
The one exception to this rule is if you need to archive a high volume of files, or very large files, that are written once and thereafter only read. A perfect example of this is call center call recordings or logs. A call center can easily generate gigabytes of recordings every minute, to be referenced in the future in case of dispute or for quality assurance. Since these are large files written once and then archived, they are the perfect example of data that should be written to a server share, common iSCSI device, etc. It would not make sense to generate the high network load and disk load required to continually create a second copy of this data.
in reply to: License violation but no calls in progress #6687The fact that the license violation occurs close to the time of a log rotation is a red herring (no relationship).
SecAst does not track calls in progress; it asks Asterisk to report the number of calls in progress. You can perform the same query from the command line:
asterisk -vx ‘core show calls’
So the question is why is your Asterisk installation reporting 8 calls in progress. This can be due to:- Valid users making calls in or out
- Valid user starting the conference feature
- Incoming callers leaving a voicemail
- Automated calls
- Hackers calling in to probe your dialplan
- Asterisk incorrectly not releasing channels
- Dialplan errors
If the number of calls reports higher than you expect, you can delve deeper into the calls in progress using a command like:
asterisk -vx ‘core show channels’If you are using FreePBX then Sangoma recently started making automatic calls in the background to set ‘time condition’ variables. In essence FreepBX is making invisible calls, and Asterisk will report these as calls in progress; nothing we can do about it, and that won’t explain 8 calls in progress.
So…in a nutshell SecAst does not count calls – it gets that number from Asterisk. Something else is going on with your Asterisk setup. Repeat the first command above once every 30 seconds and watch if your ‘calls processed’ count is increasing even when users aren’t making calls. That should help you figure out why Asterisk is reporting a count you don’t expect!
And now the bad news…it sounds like you’re struggling with some basic Linux admin and Asterisk admin tasks. If this is a commercial installation I would recommend purchasing 2 hours of support so we can help you through setup. If this is a home installation you probably have a big learning curve ahead of you in terms of Ubuntu and Asterisk – I’m not sure if it’s worthwhile for you to continue but we can’t really offer free support for Asterisk (or Ubuntu). I’m not sure if you are using a configuration generator either (you don’t offer any details of your system), but if this is a commercial installation you may want to move up to a package like xCALLY which provides a very professional turnkey solution without many of the headaches involved with many smaller packages (you don’t need to know anything about Linux or Asterisk).
in reply to: Thank you! #6686You’re very welcome. It was a great project, and I’m glad we were your partner for this important project.
in reply to: Which peers takes over when cluster reassembles #6685When the peers are in a state of dual-active contention one of the peers is considered improperly active (invalid state). In other words, it should not be processing calls at all. Usually trunks (E1/T1/SIP/IAX/etc) are forced to one peer or the other which means one peer will not be getting (more) calls. If your configuration allows both peers to handle calls simultaneous then you are in the minority (this is not typical).
For this reason the number of active calls is not a criteria in determining which peer to demote.
in reply to: Which peers takes over when cluster reassembles #6683When the cluster reassembles HAAst will discover 2 peers active (called ‘dual-active contention’). HAAst will then try to pick the peer with the lowest likelihood of long-term success (probability of staying active) and demote it. The determination of which peer is least likely to succeed considers:
- Which peer caused the previous failover
- How many failures has each peer had
- How long has each peer been running
- What was the last health score of each peer
- And more…
This works well when the peers are configured as equals (primary/primary), which implies that the cluster would be happy with either peer running. However, when the peers are not configured as equals (primary/backup), the determination of which peer should demote may result in the backup server remaining active. This describes the situation you encountered, and this is normal behavior (by design).
As of version 2.3.2.14 the administrator can override the demotion decision, ignoring the criteria listed above. If the ‘autodemote’ key is set to true in the [backuprole] stanza of either peer, then HAAst will always demote that peer.
in reply to: Peerlink fails for PBX’s in AWS #6682The peerlink error means that the two peers are unable to talk to one another. This is most likely due to Security Group misconfiguration within AWS. As a simple test, try to telnet from one peer to the other peer on port 3002; for example:
telnet 10.1.2.3 3002This command will likely fail/timeout, which confirms the Security Group misconfiguration. To resolve this, and assuming both peers are in the same Security Group, and iptables/firewalld is disabled, set that Security Group to allow “itself” traffic on all ports (in AWS). After doing so the peers should quickly find each other and the Peerlink indicators in the GUI will turn green.
If you decide you want the highest level of security possible, only enable destination port 3002 TCP access between the peers (plus ports needed for file/directory/database sync as optionally defined in your haast.conf). But if the peers are in the same Security Group you should be fine allowing all traffic.
No. Split brain usually refers to a mirrored file system (e.g.: DRBD) in which the two sides have gone out of sync. Proper recovery from split brain usually involves manually choosing which files to keep, one file at a time (or risk losing all data from one side if you blindly accept once site as correct). Since other products use block level mirroring, an interruption in the mirroring can leave files/databases in an inconsistent state and prevent Asterisk from starting or operating correctly.
HAAst on the other hand does not use a mirrored file system. In fact HAAst is the only HA system for Asterisk that does not use block level mirroring. HAAst synchronizes files/directories/databases/tables from Active to Standby only, and only when the peers are both confirmed healthy. Files use differential analysis and compression to send only changes, and databases use SQL level transactions to ensure databases are always in a consistent state.
With HAAst data is not sent if a node is detected to be in an unhealthy state, so potentially damaged files/databases are never sent to the other peer. Once a node recovers from a failure, the data from the healthy node will be sent to the recovered node to bring it back into sync. You will never have a split brain scenario with HAAst.
If that didn’t answer your question please provide more details.
-
AuthorPosts