[OpenSIPS-Users] Autoscaler in 3.2.x
Bogdan-Andrei Iancu
bogdan at opensips.org
Mon Oct 17 06:39:56 UTC 2022
Hi,
So even with the auto-scaling disabled, after a bit of a time you still
get the TCP related issues? Do you use TLS in asyc mode? if yes, try to
turn that off.
Regards,
Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
https://www.opensips-solutions.com
OpenSIPS Summit 27-30 Sept 2022, Athens
https://www.opensips.org/events/Summit-2022Athens/
On 10/12/22 1:36 AM, Yury Kirsanov wrote:
> Hi Bogdan,
> Yes, if I enable the autoscaler I immediately run into all sorts of
> issues with TCP. When it's off I'm just getting this issue from time
> to time and I have to restart OpenSIPS in that case, even though it's
> still working - part of the processes lock up and consume 100% CPU,
> but overall the system continues to service requests.
>
> https://github.com/OpenSIPS/opensips/issues/2921
> <https://github.com/OpenSIPS/opensips/issues/2921>
>
> Best regards,
> Yury.
>
> On Tue, Oct 11, 2022 at 10:59 PM Bogdan-Andrei Iancu
> <bogdan at opensips.org <mailto:bogdan at opensips.org>> wrote:
>
> Hi Yury,
>
> Is this still an issue ?
>
> Regards,
>
> Bogdan-Andrei Iancu
>
> OpenSIPS Founder and Developer
> https://www.opensips-solutions.com <https://www.opensips-solutions.com>
> OpenSIPS Summit 27-30 Sept 2022, Athens
> https://www.opensips.org/events/Summit-2022Athens/ <https://www.opensips.org/events/Summit-2022Athens/>
>
> On 9/15/22 5:26 PM, Yury Kirsanov wrote:
>> Hi Bogdan,
>> Looks like I'm running into some issues with TCP and autoscaling
>> again...Now after a good start and within about 5-10 minutes
>> after OpenSIPS restart, even if rate-limiter is enabled in
>> iptables I'm getting a lot of these errors:
>>
>> Sep 16 00:20:56 ERROR:core:send_fd: sendmsg would block on 683:
>> Resource temporarily unavailable
>> Sep 16 00:20:56 ERROR:core:send2worker: send_fd failed
>> Sep 16 00:20:56 ERROR:core:handle_new_connect: no TCP workers
>> available
>>
>> And the number of registered users starts to drop.
>>
>> I've tried to change my autoscaler profile to be a bit more
>> aggressive:
>>
>> auto_scaling_profile = PROFILE_TCP
>> scale up to 32 on 20% for 4 cycles within 5
>> scale down to 4 on 10% for 10 cycles
>>
>> But that didn't help. Current TCP settings:
>>
>> tcp_accept_aliases=0
>> tcp_keepalive=1
>> tcp_connect_timeout=1500
>> tcp_keepinterval = 10
>> tcp_keepidle = 10
>> tcp_max_msg_time = 10
>> tcp_workers = 4 use_auto_scaling_profile PROFILE_TCP
>> tcp_max_connections = 4096
>>
>> # Proto TCP
>> loadmodule "proto_tcp.so"
>> modparam("proto_tcp", "tcp_async", 1)
>> modparam("proto_tcp", "tcp_send_timeout", 1000)
>> modparam("proto_tcp", "tcp_async_local_connect_timeout", 500)
>> modparam("proto_tcp", "tcp_async_local_write_timeout", 500)
>> modparam("proto_tcp", "tcp_max_msg_chunks", 16)
>> modparam("proto_tcp", "tcp_parallel_handling", 1)
>>
>> I'm also setting TCP persistent flag before mid_register_save
>> (not sure which one to use - setflag or setbflag so doing both):
>>
>> modparam("mid_registrar", "tcp_persistent_flag",
>> "TCP_PERSIST_REGISTRATIONS")
>>
>> if (is_method("REGISTER"))
>> if ($socket_in(proto)!="udp")
>> {
>> setflag("TCP_PERSIST_REGISTRATIONS");
>> setbflag("TCP_PERSIST_REGISTRATIONS");
>> }
>>
>> That didn't help. So I had to manually set tcp_workers=32 and now
>> it works fine. Not sure what's going on here...
>>
>> Thanks and best regards,
>> Yury.
>>
>>
>> On Thu, Sep 15, 2022 at 4:02 PM Bogdan-Andrei Iancu
>> <bogdan at opensips.org <mailto:bogdan at opensips.org>> wrote:
>>
>> I'm glad it helped. keep me posted please if the auto-scaling
>> fix holds.
>>
>> Best regards,
>>
>> Bogdan-Andrei Iancu
>>
>> OpenSIPS Founder and Developer
>> https://www.opensips-solutions.com <https://www.opensips-solutions.com>
>> OpenSIPS Summit 27-30 Sept 2022, Athens
>> https://www.opensips.org/events/Summit-2022Athens/ <https://www.opensips.org/events/Summit-2022Athens/>
>>
>> On 9/14/22 10:10 PM, Yury Kirsanov wrote:
>>> Hi Bogdan,
>>> Sorry to email directly to you again, but just wanted to say
>>> a huge thank you for all your great work in supporting
>>> OpenSIPS and its users!
>>>
>>> After adjusting TCP parameters my OpenSIPS server can handle
>>> restarts easily without any issues, even though I'm
>>> currently dropping all the caches and dialogs and everything
>>> and not using any rate-limit iptables rules.
>>>
>>> Also, I've enabled the autoscaler and it seem to work great
>>> this far, please see this screenshot, you can see 79
>>> processes before the restart, then a restart and number of
>>> processes immediately dropped to a very low number even
>>> though it now keeps some load on active processes:
>>>
>>> image.png
>>>
>>> All the SIP devices were able to reconnect successfully and
>>> seem to be stable at this stage! No more memory leaks!
>>> Thanks again!
>>>
>>> Best regards,
>>> Yury.
>>>
>>> On Wed, Sep 14, 2022 at 10:58 PM Bogdan-Andrei Iancu
>>> <bogdan at opensips.org <mailto:bogdan at opensips.org>> wrote:
>>>
>>> Hi Yury,
>>>
>>> You need to check the TCP setting and to be sure your
>>> OpenSIPS will (1) not try to perform TCP connect against
>>> destination known not to be able to accept (like TCP/WS
>>> end points behind NAT) - see the tcp_no_new_conn_bflag
>>> [1] - or (2) not block for long time while attempting a
>>> connect - see the tcp_connect_timeout [2] or consider
>>> enabling async [3].
>>>
>>> [1]
>>> https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_no_new_conn_bflag
>>> <https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_no_new_conn_bflag>
>>> [2]
>>> https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_connect_timeout
>>> <https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_connect_timeout>
>>> [3]
>>> https://opensips.org/html/docs/modules/3.2.x/proto_tcp.html#idp168992
>>> <https://opensips.org/html/docs/modules/3.2.x/proto_tcp.html#idp168992>
>>>
>>> Regards,
>>>
>>> Bogdan-Andrei Iancu
>>>
>>> OpenSIPS Founder and Developer
>>> https://www.opensips-solutions.com <https://www.opensips-solutions.com>
>>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>> https://www.opensips.org/events/Summit-2022Athens/ <https://www.opensips.org/events/Summit-2022Athens/>
>>>
>>> On 9/13/22 12:01 PM, Yury Kirsanov wrote:
>>>> Hi Bogdan,
>>>> Thanks for this update, but it looks like I can't check
>>>> autoscaler because of this first issue with blocking
>>>> TCP connect. Is there a way to resolve it? Am I doing
>>>> something wrong? Or is that something to do with
>>>> OpenSIPS code? As yes, you're right, as soon as I
>>>> restart OpenSIPS having a lot of SIP devices trying to
>>>> connect to it - it goes crazy, starts to consume memory
>>>> and stops to forward packets sitting there at 100% load
>>>> until it runs out of memory and segfaults. Sometimes I
>>>> can't even restart it to come to normal state to make
>>>> it work, it just loops into same crash whatever I try
>>>> to do.
>>>>
>>>> I've compiled OpenSIPS 3.3.1 with your patch and was
>>>> able to start it but not sure, maybe I was just lucky
>>>> this time.
>>>>
>>>> What should I do? Thanks!
>>>>
>>>> Best regards,
>>>> Yury.
>>>>
>>>> On Tue, 13 Sept 2022, 18:56 Bogdan-Andrei Iancu,
>>>> <bogdan at opensips.org <mailto:bogdan at opensips.org>> wrote:
>>>>
>>>> Hi Yury,
>>>>
>>>> it looks like you some multiple issues, overlapping
>>>> here. The traps you sent here have nothing to do
>>>> with the auto-scaling, but with a blocking TCP
>>>> connect for SIP - most of the procs get blocked
>>>> into a sync TCP connect.
>>>>
>>>> Regards,
>>>>
>>>> Bogdan-Andrei Iancu
>>>>
>>>> OpenSIPS Founder and Developer
>>>> https://www.opensips-solutions.com <https://www.opensips-solutions.com>
>>>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>>> https://www.opensips.org/events/Summit-2022Athens/ <https://www.opensips.org/events/Summit-2022Athens/>
>>>>
>>>> On 9/12/22 4:39 PM, Yury Kirsanov wrote:
>>>>> Hi Bogdan,
>>>>> I've applied the patch (had to find where to apply
>>>>> it manually for 3.2.8 downloaded from Web page,
>>>>> line 1568 instead of 1652) and restarted the
>>>>> server with only about 300-350 SIP devices and
>>>>> immediately got into same issue. I'm attaching two
>>>>> GDB dumps made within several minutes from each
>>>>> other. Autoscale was now OFF, please see my
>>>>> previous message as currently for some reason I'm
>>>>> experiencing lockups even when it's off :(
>>>>
>>>>> Best regards,
>>>>> Yury.
>>>>>
>>>>> On Mon, Sep 12, 2022 at 7:48 PM Bogdan-Andrei
>>>>> Iancu <bogdan at opensips.org
>>>>> <mailto:bogdan at opensips.org>> wrote:
>>>>>
>>>>> Hi Yuri,
>>>>>
>>>>> Could you give this patch a try? it should fix
>>>>> the blocking you experience (it should apply
>>>>> on 3.2 too).
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Bogdan-Andrei Iancu
>>>>>
>>>>> OpenSIPS Founder and Developer
>>>>> https://www.opensips-solutions.com <https://www.opensips-solutions.com>
>>>>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>>>> https://www.opensips.org/events/Summit-2022Athens/ <https://www.opensips.org/events/Summit-2022Athens/>
>>>>>
>>>>> On 9/7/22 2:54 PM, Bogdan-Andrei Iancu wrote:
>>>>>> Hi Yury,
>>>>>>
>>>>>> Thanks for the details info here - let me do
>>>>>> a review of some code and run some tests, as
>>>>>> at this point I have a good idea on the
>>>>>> direction to dig into.
>>>>>>
>>>>>> I will update here.
>>>>>>
>>>>>> Best regards,
>>>>>> Bogdan-Andrei Iancu
>>>>>>
>>>>>> OpenSIPS Founder and Developer
>>>>>> https://www.opensips-solutions.com <https://www.opensips-solutions.com>
>>>>>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>>>>> https://www.opensips.org/events/Summit-2022Athens/ <https://www.opensips.org/events/Summit-2022Athens/>
>>>>>> On 9/6/22 11:24 AM, Yury Kirsanov wrote:
>>>>>>> Hi Bogdan,
>>>>>>> Yes, I'm listening on all types of sockets
>>>>>>> including UDP, TCP and TLS on the outside
>>>>>>> public interface and then forward traffic
>>>>>>> into internal LAN via UDP only.
>>>>>>>
>>>>>>> Previously it was getting stuck quite
>>>>>>> easily, now I had to wait for a while before
>>>>>>> this actually happened. I've routed part of
>>>>>>> my customers to this server to obtain this
>>>>>>> result so I will have to do that again.
>>>>>>>
>>>>>>> As soon as I see one of the processes stuck
>>>>>>> I'll dot the trap command and send you all
>>>>>>> the details including processes load, ps
>>>>>>> output and so on.
>>>>>>>
>>>>>>> For now I had to switch autoscaling off and
>>>>>>> just create many listeners. Do I understand
>>>>>>> correctly that I need to restart OpenSIPS in
>>>>>>> order to apply autoscaling profiles and
>>>>>>> reload-routes is not sufficient?
>>>>>>>
>>>>>>> Also, do I need separate UDP profiles for
>>>>>>> public and private interfaces? And do I need
>>>>>>> to apply autoscaling profile just to a
>>>>>>> socket or I need to specify udp or
>>>>>>> tcp_workers with autoscaler too?
>>>>>>>
>>>>>>> Thanks and best regards,
>>>>>>> Yury.
>>>>>>>
>>>>>>> On Tue, 6 Sept 2022, 18:18 Bogdan-Andrei
>>>>>>> Iancu, <bogdan at opensips.org
>>>>>>> <mailto:bogdan at opensips.org>> wrote:
>>>>>>>
>>>>>>> Hi Yury,
>>>>>>>
>>>>>>> Thanks for the info. I see that the
>>>>>>> stuck process (24) is an auto-scalled
>>>>>>> one (based on its id). Do you have SIP
>>>>>>> traffic from UDP to TCP or doing some
>>>>>>> HEP capturing for SIP ? I saw a recent
>>>>>>> similar report where a UDP auto-scalled
>>>>>>> worked got stuck when trying to do some
>>>>>>> communication with the TCP main/manager
>>>>>>> process (in order to handle a TCP
>>>>>>> operation).
>>>>>>>
>>>>>>> BTW, any chance to do a "opensips-cli -x
>>>>>>> trap" when you have that stuck process,
>>>>>>> just to see where is it stuck? and is it
>>>>>>> hard to reproduce? as I may ask you to
>>>>>>> extract some information from the
>>>>>>> running process....
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Bogdan-Andrei Iancu
>>>>>>>
>>>>>>> OpenSIPS Founder and Developer
>>>>>>> https://www.opensips-solutions.com <https://www.opensips-solutions.com>
>>>>>>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>>>>>> https://www.opensips.org/events/Summit-2022Athens/ <https://www.opensips.org/events/Summit-2022Athens/>
>>>>>>>
>>>>>>> On 9/3/22 6:54 PM, Yury Kirsanov wrote:
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users at lists.opensips.org <mailto:Users at lists.opensips.org>
>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users <http://lists.opensips.org/cgi-bin/mailman/listinfo/users>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20221017/c2345388/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 103945 bytes
Desc: not available
URL: <http://lists.opensips.org/pipermail/users/attachments/20221017/c2345388/attachment-0001.png>
More information about the Users
mailing list