[OpenSIPS-Users] Using Contact replication and HA

Bogdan-Andrei Iancu bogdan at opensips.org
Thu May 11 04:44:03 EDT 2017


John,

as said, the force_socket impacts *only* the contacts with *no socket 
info* (see my prev post).

Regards,

Bogdan-Andrei Iancu
   OpenSIPS Founder and Developer
   http://www.opensips-solutions.com

OpenSIPS Summit May 2017 Amsterdam
   http://www.opensips.org/events/Summit-2017Amsterdam.html

On 05/10/2017 09:28 PM, John Quick wrote:
> Hi Bogdan,
>
> I tried the force_socket parameter in nathelper, but it did not work. The SIP Pings continue to come from the address stored in the socket field of the contact. Force_socket is ignored and has no impact at all.
>
> I also tried the parameter natping_socket, but this breaks OpenSIPS and prevents it starting. The log file shows this:
> 2017-05-10 18:05:46 ERROR:nathelper:init_raw_socket: cannot create raw socket
> 2017-05-10 18:05:46 ERROR:core:init_mod: failed to initialize module nathelper
> 2017-05-10 18:05:46 ERROR:core:main: error while initializing modules
>
> John Quick
> Smartvox Limited
>
>
> -----Original Message-----
> From: Bogdan-Andrei Iancu [mailto:bogdan at opensips.org]
> Sent: 10 May 2017 11:07
> To: john.quick at smartvox.co.uk
> Cc: users at lists.opensips.org
> Subject: Re: [OpenSIPS-Users] Using Contact replication and HA
>
> Hi John,
>
> What you did (with the "net.ipv4.ip_nonlocal_bind") is a good workaround for the problem.
>
> Also, I investigated the original issue and here it is:
>     1) the replicated contact (on backup) is saved with NULL socket, as the received one is not valid (there is no err log on this, but only a dbg log)
>     2) when pinging the contact via nathelper, as the socket is NULL, nathelper is trying to get a socket, but simply using the first listener matching the proto (UDP) and AF family (ipv4) as per destination
>     3) it looks like this first UDP listener is not compatible with the destination (localhost or a private network??)
>
> Have you tried to use force_socket:
> http://www.opensips.org/html/docs/modules/2.3.x/nathelper.html#idp5512752
> (it take effect only if the contact has no socket assigned).
>
> Regards,
>
> Bogdan-Andrei Iancu
>     OpenSIPS Founder and Developer
>     http://www.opensips-solutions.com
>
> OpenSIPS Summit May 2017 Amsterdam
>     http://www.opensips.org/events/Summit-2017Amsterdam.html
>
> On 05/09/2017 05:54 PM, John Quick wrote:
>> Hi Bogdan,
>>
>> I tried different scenarios and eventually ended up with the backup server having a listen statement for the VIP address.
>> Normally you cannot start OpenSIPS (or any other application) binding to an IP address that is not assigned on a local interface.
>> However, adding the line "net.ipv4.ip_nonlocal_bind = 1" to /etc/sysctl.conf I was then able to start OpenSIPS with that listen statement in place.
>>
>> The backup server also listens on its own static IP using the proto_bin mechanism so it can receive and send replications while it is in "standby" mode.
>>
>> That is the dilemma:
>> Replicated Contacts can only be useful if the backup server is able to take over the same VIP that was used on the primary server.
>> If the backup server does not use the VIP when it takes over as "active", then the replicated socket information in the location table will be wrong.
>> If OpenSIPS only starts on the backup server *after* that server has acquired the VIP then it could not receive the replicated Contacts using proto_bin when it was in standby mode.
>>
>> John Quick
>> Smartvox Limited
>>
>>
>> -----Original Message-----
>> From: Bogdan-Andrei Iancu [mailto:bogdan at opensips.org]
>> Sent: 09 May 2017 14:45
>> To: john.quick at smartvox.co.uk; OpenSIPS users mailling list
>> <users at lists.opensips.org>
>> Subject: Re: [OpenSIPS-Users] Using Contact replication and HA
>>
>> Hi John,
>>
>> So, in your setup, on the backup server, OpenSIPS is not listening on the VIP address at all, right ?
>>
>> Best regards,
>>
>> Bogdan-Andrei Iancu
>>      OpenSIPS Founder and Developer
>>      http://www.opensips-solutions.com
>>
>> OpenSIPS Summit May 2017 Amsterdam
>>      http://www.opensips.org/events/Summit-2017Amsterdam.html
>>
>> On 05/03/2017 04:46 PM, John Quick wrote:
>>> Hello,
>>>
>>> I am still working my way through some of the new features described
>>> at last year's Summit conference while you are all hopefully enjoying
>>> this year's Summit.
>>>
>>> I'm playing with the Clusterer module. It is a great idea but I am
>>> finding a few practical difficulties for contact replication in the USRLOC module.
>>>
>>> In my test rig, there are two almost identical OpenSIPS servers (A and B).
>>> Contact replication is enabled between the two servers and each
>>> server has its own local database.
>>>
>>> Linux HA - Corosync and Pacemaker - is used to control a Virtual IP
>>> (VIP) address resource. This allows UA's to register at the VIP
>>> address. HA decides which server has the virtual address at any given
>>> time, based on node availability. Currently, Server A is assigned the
>>> VIP and processes all UA registrations.
>>>
>>> Problem: The "socket" field in the location table contains the VIP
>>> address on both server A and B, but only Server A is bound to that
>>> address while both servers are up.
>>> Unless I completely disable NAT Pings in the nathelper module, Server
>>> B reports a lot of errors like this:
>>> 2017-05-03 14:15:51 CRITICAL:core:proto_udp_send: invalid
>>> sendtoparameters#012one possible reason is the server is bound to
>>> localhost and#012attempts to send to the net
>>> 2017-05-03 14:15:51 ERROR:nathelper:msg_send: send() for proto 1
>>> failed
>>> 2017-05-03 14:15:51 ERROR:nathelper:nh_timer: sip msg_send failed!
>>>
>>> Worse, if I also enable the "remove_on_timeout_bflag" option on
>>> Server B, it removes the registration on *both* servers after a short
>>> delay even though the UA is still available!
>>>
>>> Initially, I encountered problems with the HA IP Resource (or VIP)
>>> with respect to OpenSIPS not starting on server B because it was
>>> trying to bind to an address that was not currently assigned to any
>>> local interface. While it is possible to group the IP resource with
>>> the OpenSIPS service resource to overcome this problem, that would
>>> completely break USRLOC contact replication because the OpenSIPS
>>> service on Server B would not be running as long as Server A is up. I
>>> had to resort to an option in sysctl.conf that allows processes to
>>> start even if they are trying to bind to a non-local address.
>>>
>>> This makes me wonder what is the purpose of Usrloc Contact
>>> replication? Is there some other scenario that could use it and not have these problems?
>>> I also wonder what difference does the db_mode setting in Usrloc make
>>> when using contact replication.
>>>     
>>> John Quick
>>> Smartvox Limited
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opensips.org
>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users




More information about the Users mailing list