[OpenSIPS-Users] Autoscaler in 3.2.x
Bogdan-Andrei Iancu
bogdan at opensips.org
Mon Sep 12 09:53:41 UTC 2022
Hi Yury,
Maybe you can get a trap output while the procs are in 100% and before
everything dies ?
Best regards,
Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
https://www.opensips-solutions.com
OpenSIPS Summit 27-30 Sept 2022, Athens
https://www.opensips.org/events/Summit-2022Athens/
On 9/12/22 11:12 AM, Yury Kirsanov wrote:
> Hi Bogdan,
> We've run into another issue, this time I was just restarting OpenSIPS
> server during busy hours when about ~2500 SIP devices were registering
> and making calls (even though dialog number was only around 100-200
> but there were a lot of packets) and I was unable to successfully
> restart OpenSIPS, it was getting some processes stuck almost
> immediately at 100% load and then they were starting to consume more
> and more memory and after eating up all the memory they were dying and
> OpenSIPS stopped processing SIP packets.
>
> I believe it's similar to autoscaler issue because in this case I only
> had 16 UDP workers and 16 TCP workers and it was taking more time for
> OpenSIPS to run into the issue, while when I had autoscaler on it
> wasn't able to open that many processes at once so currently active
> ones were getting stuck very fast and crash was happening almost
> immediately.
>
> I'm running a localhost REDIS cache to store where to proxy each SIP
> packet to and if there's no record for this SIP device then I'm
> querying REST server and cache its response. REST server load was no
> more than 25% during restart when all SIP devices were urgently trying
> to re-connect to OpenSIPS so I don't think they're of any issue.
>
> I'm using async REST calls and believe there should be no issues with
> my configuration script even though it runs a lot of nested routes due
> to async REST requests. Hopefully I didn't forget some 'exit'
> statements anywhere but if it was the case - OpenSIPS service would be
> locking up at any time.
>
> OpenSIPS itself is running on a VMWare host as a virtual machine and I
> could see it was consuming up to 100% CPU of a 40-core host when it
> was locking up. Also VMWare readyness for VM was spiking to 1500ms
> during these lock-ups meaning that VM was waiting for some cores to
> actually free up to get some CPU time.
>
> The only way out of this situation for me was to run multiple OpenSIPS
> VMs and spread the load between them, no matter what I tried to do
> I wasn't able to make OpenSIPS running fine again even though it was
> working perfectly fine for more than a week in this configuration and
> under same load, but I was starting/restarting it only during night
> hours when there were no calls active.
>
> I'm happy to share my configuration file with you privately if requred.
>
> Hope this helps!
>
> Thanks and best regards,
> Yury.
>
> On Wed, Sep 7, 2022 at 9:54 PM Bogdan-Andrei Iancu
> <bogdan at opensips.org <mailto:bogdan at opensips.org>> wrote:
>
> Hi Yury,
>
> Thanks for the details info here - let me do a review of some code
> and run some tests, as at this point I have a good idea on the
> direction to dig into.
>
> I will update here.
>
> Best regards,
>
> Bogdan-Andrei Iancu
>
> OpenSIPS Founder and Developer
> https://www.opensips-solutions.com <https://www.opensips-solutions.com>
> OpenSIPS Summit 27-30 Sept 2022, Athens
> https://www.opensips.org/events/Summit-2022Athens/ <https://www.opensips.org/events/Summit-2022Athens/>
>
> On 9/6/22 11:24 AM, Yury Kirsanov wrote:
>> Hi Bogdan,
>> Yes, I'm listening on all types of sockets including UDP, TCP and
>> TLS on the outside public interface and then forward traffic into
>> internal LAN via UDP only.
>>
>> Previously it was getting stuck quite easily, now I had to wait
>> for a while before this actually happened. I've routed part of my
>> customers to this server to obtain this result so I will have to
>> do that again.
>>
>> As soon as I see one of the processes stuck I'll dot the trap
>> command and send you all the details including processes load, ps
>> output and so on.
>>
>> For now I had to switch autoscaling off and just create many
>> listeners. Do I understand correctly that I need to restart
>> OpenSIPS in order to apply autoscaling profiles and reload-routes
>> is not sufficient?
>>
>> Also, do I need separate UDP profiles for public and private
>> interfaces? And do I need to apply autoscaling profile just to a
>> socket or I need to specify udp or tcp_workers with autoscaler too?
>>
>> Thanks and best regards,
>> Yury.
>>
>> On Tue, 6 Sept 2022, 18:18 Bogdan-Andrei Iancu,
>> <bogdan at opensips.org <mailto:bogdan at opensips.org>> wrote:
>>
>> Hi Yury,
>>
>> Thanks for the info. I see that the stuck process (24) is an
>> auto-scalled one (based on its id). Do you have SIP traffic
>> from UDP to TCP or doing some HEP capturing for SIP ? I saw a
>> recent similar report where a UDP auto-scalled worked got
>> stuck when trying to do some communication with the TCP
>> main/manager process (in order to handle a TCP operation).
>>
>> BTW, any chance to do a "opensips-cli -x trap" when you have
>> that stuck process, just to see where is it stuck? and is it
>> hard to reproduce? as I may ask you to extract some
>> information from the running process....
>>
>> Regards,
>>
>> Bogdan-Andrei Iancu
>>
>> OpenSIPS Founder and Developer
>> https://www.opensips-solutions.com <https://www.opensips-solutions.com>
>> OpenSIPS Summit 27-30 Sept 2022, Athens
>> https://www.opensips.org/events/Summit-2022Athens/ <https://www.opensips.org/events/Summit-2022Athens/>
>>
>> On 9/3/22 6:54 PM, Yury Kirsanov wrote:
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20220912/5482940a/attachment-0001.html>
More information about the Users
mailing list