<div dir="ltr">Hi Bogdan,<div>Looks like my problem was quite complex as I had following issues:</div><div><br></div><div>1. tcp_async was off</div><div>2. TCP timeouts were set to be very high</div><div><br></div><div>I've tried to just enable tcp_async and that didn't help - after restart and TCP SYN storm OpenSIPS started to consume memory and processes got locked up again. Then I started to tune other parameters. Here's like it was before:</div><div><br></div><div># Proto TCP<br>loadmodule "proto_tcp.so"<br>modparam("proto_tcp", "tcp_async", 1)<br>modparam("proto_tcp", "tcp_send_timeout", 5000)<br>modparam("proto_tcp", "tcp_async_local_connect_timeout", 5000)<br>modparam("proto_tcp", "tcp_async_local_write_timeout", 5000)<br>modparam("proto_tcp", "tcp_max_msg_chunks", 16)<br><br></div><div>I had a very high tcp_send_timout because some of our customers are connecting from across the globe and have high latency times, of course that's not 5 seconds but I set it that high just to make sure they will be able to connect. Now I ended up with this config:</div><div><br></div><div># Proto TCP<br>loadmodule "proto_tcp.so"<br>modparam("proto_tcp", "tcp_async", 1)<br>modparam("proto_tcp", "tcp_send_timeout", 1000)<br>modparam("proto_tcp", "tcp_async_local_connect_timeout", 500)<br>modparam("proto_tcp", "tcp_async_local_write_timeout", 500)<br>modparam("proto_tcp", "tcp_max_msg_chunks", 16)<br>modparam("proto_tcp", "tcp_parallel_handling", 1)<br></div><div><br></div><div>And looks like OpenSIPS is now able to survive restarts!</div><div><br></div><div>One more thing I tried before was to rate-limit TCP connections on iptables - that also helped even in my incorrect configuration and with blocking TCP mode. I rate-limited TCP SYN packets on my public interface on TCP ports that go to OpenSIPS using iptables rate-limit module with 10 packets per second and 50 packets burst - that also seemed to help. This can be adjusted as required depending on new connections load. Hope this helps someone who would run into the same troubles!</div><div><br></div><div>I will continue monitoring ourĀ OpenSIPS instances and if everything works fine after restart I will enable auto-scaler to test it with the new patch.</div><div><br></div><div>Thanks a lot for your help, Bogdan, that's much appreciated!</div><div><br></div><div>Best regards,</div><div>Yury.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Sep 15, 2022 at 1:22 AM Yury Kirsanov <<a href="mailto:y.kirsanov@gmail.com">y.kirsanov@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi Bogdan,<div>Thanks for your answer, I've checked my configs and yes, for some reason I had tcp_async off!!! I will definitely switch it on for now and then give it a try!!! Can't believe I missed that one!!!</div><div><br></div><div>Best regards,</div><div>Yury.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Sep 14, 2022 at 10:58 PM Bogdan-Andrei Iancu <<a href="mailto:bogdan@opensips.org" target="_blank">bogdan@opensips.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<font face="monospace">Hi Yury,<br>
<br>
You need to check the TCP setting and to be sure your OpenSIPS
will (1) not try to perform TCP connect against destination known
not to be able to accept (like TCP/WS end points behind NAT) - see
the tcp_no_new_conn_bflag [1] - or (2) not block for long time
while attempting a connect - see the tcp_connect_timeout [2] or
consider enabling async [3].<br>
<br>
[1]
<a href="https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_no_new_conn_bflag" target="_blank">https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_no_new_conn_bflag</a><br>
[2]
<a href="https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_connect_timeout" target="_blank">https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_connect_timeout</a><br>
[3]
<a href="https://opensips.org/html/docs/modules/3.2.x/proto_tcp.html#idp168992" target="_blank">https://opensips.org/html/docs/modules/3.2.x/proto_tcp.html#idp168992</a><br>
<br>
Regards,<br>
</font>
<pre cols="72">Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
<a href="https://www.opensips-solutions.com" target="_blank">https://www.opensips-solutions.com</a>
OpenSIPS Summit 27-30 Sept 2022, Athens
<a href="https://www.opensips.org/events/Summit-2022Athens/" target="_blank">https://www.opensips.org/events/Summit-2022Athens/</a></pre>
<div>On 9/13/22 12:01 PM, Yury Kirsanov
wrote:<br>
</div>
<blockquote type="cite">
<div dir="auto">Hi Bogdan,
<div dir="auto">Thanks for this update, but it looks like I
can't check autoscaler because of this first issue with
blocking TCP connect. Is there a way to resolve it? Am I doing
something wrong? Or is that something to do with OpenSIPS
code? As yes, you're right, as soon as I restart OpenSIPS
having a lot of SIP devices trying to connect to it - it goes
crazy, starts to consume memory and stops to forward packets
sitting there at 100% load until it runs out of memory and
segfaults. Sometimes I can't even restart it to come to normal
state to make it work, it just loops into same crash whatever
I try to do.</div>
<div dir="auto"><br>
</div>
<div dir="auto">I've compiled OpenSIPS 3.3.1 with your patch and
was able to start it but not sure, maybe I was just lucky this
time.</div>
<div dir="auto"><br>
</div>
<div dir="auto">What should I do? Thanks!</div>
<div dir="auto"><br>
</div>
<div dir="auto">Best regards,</div>
<div dir="auto">Yury.</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, 13 Sept 2022, 18:56
Bogdan-Andrei Iancu, <<a href="mailto:bogdan@opensips.org" target="_blank">bogdan@opensips.org</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div> Hi Yury,<br>
<br>
it looks like you some multiple issues, overlapping here.
The traps you sent here have nothing to do with the
auto-scaling, but with a blocking TCP connect for SIP - most
of the procs get blocked into a sync TCP connect.<br>
<br>
Regards,<br>
<pre cols="72">Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
<a href="https://www.opensips-solutions.com" rel="noreferrer" target="_blank">https://www.opensips-solutions.com</a>
OpenSIPS Summit 27-30 Sept 2022, Athens
<a href="https://www.opensips.org/events/Summit-2022Athens/" rel="noreferrer" target="_blank">https://www.opensips.org/events/Summit-2022Athens/</a></pre>
<div>On 9/12/22 4:39 PM, Yury Kirsanov wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi Bogdan,
<div>I've applied the patch (had to find where to apply
it manually for 3.2.8 downloaded from Web page, line
1568 instead of 1652) and restarted the server with
only about 300-350 SIP devices and immediately got
into same issue. I'm attaching two GDB dumps made
within several minutes from each other. Autoscale was
now OFF, please see my previous message as currently
for some reason I'm experiencing lockups even when
it's off :(</div>
</div>
</blockquote>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>Best regards,</div>
<div>Yury.</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Sep 12, 2022
at 7:48 PM Bogdan-Andrei Iancu <<a href="mailto:bogdan@opensips.org" rel="noreferrer" target="_blank">bogdan@opensips.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div> <font face="monospace">Hi Yuri,<br>
<br>
Could you give this patch a try? it should fix the
blocking you experience (it should apply on 3.2
too).<br>
<br>
Best regards,<br>
</font>
<pre cols="72">Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
<a href="https://www.opensips-solutions.com" rel="noreferrer" target="_blank">https://www.opensips-solutions.com</a>
OpenSIPS Summit 27-30 Sept 2022, Athens
<a href="https://www.opensips.org/events/Summit-2022Athens/" rel="noreferrer" target="_blank">https://www.opensips.org/events/Summit-2022Athens/</a></pre>
<div>On 9/7/22 2:54 PM, Bogdan-Andrei Iancu wrote:<br>
</div>
<blockquote type="cite"> <font face="monospace">Hi
Yury,<br>
<br>
Thanks for the details info here - let me do a
review of some code and run some tests, as at
this point I have a good idea on the direction
to dig into.<br>
<br>
I will update here.<br>
<br>
Best regards,<br>
</font>
<pre cols="72">Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
<a href="https://www.opensips-solutions.com" rel="noreferrer" target="_blank">https://www.opensips-solutions.com</a>
OpenSIPS Summit 27-30 Sept 2022, Athens
<a href="https://www.opensips.org/events/Summit-2022Athens/" rel="noreferrer" target="_blank">https://www.opensips.org/events/Summit-2022Athens/</a></pre>
<div>On 9/6/22 11:24 AM, Yury Kirsanov wrote:<br>
</div>
<blockquote type="cite">
<div dir="auto">Hi Bogdan,
<div dir="auto">Yes, I'm listening on all
types of sockets including UDP, TCP and TLS
on the outside public interface and then
forward traffic into internal LAN via UDP
only.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Previously it was getting
stuck quite easily, now I had to wait for a
while before this actually happened. I've
routed part of my customers to this server
to obtain this result so I will have to do
that again.</div>
<div dir="auto"><br>
</div>
<div dir="auto">As soon as I see one of the
processes stuck I'll dot the trap command
and send you all the details including
processes load, ps output and so on.</div>
<div dir="auto"><br>
</div>
<div dir="auto">For now I had to switch
autoscaling off and just create many
listeners. Do I understand correctly that I
need to restart OpenSIPS in order to apply
autoscaling profiles and reload-routes is
not sufficient?</div>
<div dir="auto"><br>
</div>
<div dir="auto">Also, do I need separate UDP
profiles for public and private interfaces?
And do I need to apply autoscaling profile
just to a socket or I need to specify udp or
tcp_workers with autoscaler too?</div>
<div dir="auto"><br>
</div>
<div dir="auto">Thanks and best regards,</div>
<div dir="auto">Yury.</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, 6
Sept 2022, 18:18 Bogdan-Andrei Iancu, <<a href="mailto:bogdan@opensips.org" rel="noreferrer" target="_blank">bogdan@opensips.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div> <font face="monospace">Hi Yury,<br>
<br>
Thanks for the info. I see that the
stuck process (24) is an auto-scalled
one (based on its id). Do you have SIP
traffic from UDP to TCP or doing some
HEP capturing for SIP ? I saw a recent
similar report where a UDP auto-scalled
worked got stuck when trying to do some
communication with the TCP main/manager
process (in order to handle a TCP
operation).<br>
<br>
BTW, any chance to do a "opensips-cli -x
trap" when you have that stuck process,
just to see where is it stuck? and is it
hard to reproduce? as I may ask you to
extract some information from the
running process....<br>
<br>
Regards,<br>
</font>
<pre cols="72">Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
<a href="https://www.opensips-solutions.com" rel="noreferrer noreferrer" target="_blank">https://www.opensips-solutions.com</a>
OpenSIPS Summit 27-30 Sept 2022, Athens
<a href="https://www.opensips.org/events/Summit-2022Athens/" rel="noreferrer noreferrer" target="_blank">https://www.opensips.org/events/Summit-2022Athens/</a></pre>
<div>On 9/3/22 6:54 PM, Yury Kirsanov
wrote:<br>
</div>
</div>
</blockquote>
</div>
</blockquote>
<br>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
Users mailing list
<a href="mailto:Users@lists.opensips.org" rel="noreferrer" target="_blank">Users@lists.opensips.org</a>
<a href="http://lists.opensips.org/cgi-bin/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.opensips.org/cgi-bin/mailman/listinfo/users</a>
</pre>
</blockquote>
<br>
</div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</blockquote></div>
</blockquote></div>