<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<font face="monospace">Hi,<br>
<br>
So even with the auto-scaling disabled, after a bit of a time you
still get the TCP related issues? Do you use TLS in asyc mode? if
yes, try to turn that off.<br>
<br>
Regards,<br>
</font>
<pre class="moz-signature" cols="72">Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
<a class="moz-txt-link-freetext" href="https://www.opensips-solutions.com">https://www.opensips-solutions.com</a>
OpenSIPS Summit 27-30 Sept 2022, Athens
<a class="moz-txt-link-freetext" href="https://www.opensips.org/events/Summit-2022Athens/">https://www.opensips.org/events/Summit-2022Athens/</a></pre>
<div class="moz-cite-prefix">On 10/12/22 1:36 AM, Yury Kirsanov
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAD1_sevbTqEU4KnCxMou2y3EXm7eW4xY4GeZhqJhLhwYhXYHTw@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">Hi Bogdan,
<div>Yes, if I enable the autoscaler I immediately run into all
sorts of issues with TCP. When it's off I'm just getting this
issue from time to time and I have to restart OpenSIPS in that
case, even though it's still working - part of the processes
lock up and consume 100% CPU, but overall the system continues
to service requests.</div>
<div><br>
</div>
<div><a href="https://github.com/OpenSIPS/opensips/issues/2921"
moz-do-not-send="true">https://github.com/OpenSIPS/opensips/issues/2921</a><br>
</div>
<div><br>
</div>
<div>Best regards,</div>
<div>Yury.</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, Oct 11, 2022 at 10:59
PM Bogdan-Andrei Iancu <<a
href="mailto:bogdan@opensips.org" moz-do-not-send="true">bogdan@opensips.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div> <font face="monospace">Hi Yury,<br>
<br>
Is this still an issue ?<br>
<br>
Regards,<br>
</font>
<pre cols="72">Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
<a href="https://www.opensips-solutions.com" target="_blank" moz-do-not-send="true">https://www.opensips-solutions.com</a>
OpenSIPS Summit 27-30 Sept 2022, Athens
<a href="https://www.opensips.org/events/Summit-2022Athens/" target="_blank" moz-do-not-send="true">https://www.opensips.org/events/Summit-2022Athens/</a></pre>
<div>On 9/15/22 5:26 PM, Yury Kirsanov wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi Bogdan,
<div>Looks like I'm running into some issues with TCP
and autoscaling again...Now after a good start and
within about 5-10 minutes after OpenSIPS restart, even
if rate-limiter is enabled in iptables I'm getting a
lot of these errors:</div>
<div><br>
</div>
<div>Sep 16 00:20:56 ERROR:core:send_fd: sendmsg would
block on 683: Resource temporarily unavailable<br>
Sep 16 00:20:56 ERROR:core:send2worker: send_fd failed<br>
Sep 16 00:20:56 ERROR:core:handle_new_connect: no TCP
workers available<br>
</div>
<div><br>
</div>
<div>And the number of registered users starts to drop.</div>
<div><br>
</div>
<div>I've tried to change my autoscaler profile to be a
bit more aggressive:</div>
<div><br>
</div>
<div>auto_scaling_profile = PROFILE_TCP<br>
scale up to 32 on 20% for 4 cycles within 5<br>
scale down to 4 on 10% for 10 cycles<br>
</div>
<div><br>
</div>
<div>But that didn't help. Current TCP settings:</div>
<div><br>
</div>
<div>tcp_accept_aliases=0<br>
tcp_keepalive=1<br>
tcp_connect_timeout=1500<br>
tcp_keepinterval = 10<br>
tcp_keepidle = 10<br>
tcp_max_msg_time = 10<br>
tcp_workers = 4 use_auto_scaling_profile PROFILE_TCP<br>
tcp_max_connections = 4096<br>
</div>
<div><br>
</div>
<div># Proto TCP<br>
loadmodule "proto_tcp.so"<br>
modparam("proto_tcp", "tcp_async", 1)<br>
modparam("proto_tcp", "tcp_send_timeout", 1000)<br>
modparam("proto_tcp",
"tcp_async_local_connect_timeout", 500)<br>
modparam("proto_tcp", "tcp_async_local_write_timeout",
500)<br>
modparam("proto_tcp", "tcp_max_msg_chunks", 16)<br>
modparam("proto_tcp", "tcp_parallel_handling", 1)<br>
</div>
<div><br>
</div>
<div>I'm also setting TCP persistent flag before
mid_register_save (not sure which one to use - setflag
or setbflag so doing both):</div>
<div><br>
</div>
<div>modparam("mid_registrar", "tcp_persistent_flag",
"TCP_PERSIST_REGISTRATIONS")<br>
</div>
<div><br>
</div>
<div> if (is_method("REGISTER"))<br>
if ($socket_in(proto)!="udp")<br>
{<br>
setflag("TCP_PERSIST_REGISTRATIONS");<br>
setbflag("TCP_PERSIST_REGISTRATIONS");<br>
}<br>
<br>
</div>
<div>That didn't help. So I had to manually set
tcp_workers=32 and now it works fine. Not sure what's
going on here...</div>
<div><br>
</div>
<div>Thanks and best regards,</div>
<div>Yury.</div>
<div><br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Sep 15, 2022
at 4:02 PM Bogdan-Andrei Iancu <<a
href="mailto:bogdan@opensips.org" target="_blank"
moz-do-not-send="true">bogdan@opensips.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div> <font face="monospace">I'm glad it helped. keep
me posted please if the auto-scaling fix holds.<br>
<br>
Best regards,<br>
</font>
<pre cols="72">Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
<a href="https://www.opensips-solutions.com" target="_blank" moz-do-not-send="true">https://www.opensips-solutions.com</a>
OpenSIPS Summit 27-30 Sept 2022, Athens
<a href="https://www.opensips.org/events/Summit-2022Athens/" target="_blank" moz-do-not-send="true">https://www.opensips.org/events/Summit-2022Athens/</a></pre>
<div>On 9/14/22 10:10 PM, Yury Kirsanov wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi Bogdan,
<div>Sorry to email directly to you again, but
just wanted to say a huge thank you for all
your great work in supporting OpenSIPS and its
users!</div>
<div><br>
</div>
<div>After adjusting TCP parameters my OpenSIPS
server can handle restarts easily without any
issues, even though I'm currently dropping all
the caches and dialogs and everything and not
using any rate-limit iptables rules.</div>
<div><br>
</div>
<div>Also, I've enabled the autoscaler and it
seem to work great this far, please see this
screenshot, you can see 79 processes before
the restart, then a restart and number of
processes immediately dropped to a very low
number even though it now keeps some load on
active processes:</div>
<div><br>
</div>
<div><img
src="cid:part8.F8877FA8.772ED159@opensips.org"
alt="image.png" style="margin-right: 25px;"
class=""><br>
</div>
<div><br>
</div>
<div>All the SIP devices were able to reconnect
successfully and seem to be stable at this
stage! No more memory leaks! Thanks again!</div>
<div><br>
</div>
<div>Best regards,</div>
<div>Yury.</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, Sep
14, 2022 at 10:58 PM Bogdan-Andrei Iancu <<a
href="mailto:bogdan@opensips.org"
target="_blank" moz-do-not-send="true">bogdan@opensips.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div> <font face="monospace">Hi Yury,<br>
<br>
You need to check the TCP setting and to
be sure your OpenSIPS will (1) not try to
perform TCP connect against destination
known not to be able to accept (like
TCP/WS end points behind NAT) - see the
tcp_no_new_conn_bflag [1] - or (2) not
block for long time while attempting a
connect - see the tcp_connect_timeout [2]
or consider enabling async [3].<br>
<br>
[1] <a
href="https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_no_new_conn_bflag"
target="_blank" moz-do-not-send="true">https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_no_new_conn_bflag</a><br>
[2] <a
href="https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_connect_timeout"
target="_blank" moz-do-not-send="true">https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_connect_timeout</a><br>
[3] <a
href="https://opensips.org/html/docs/modules/3.2.x/proto_tcp.html#idp168992"
target="_blank" moz-do-not-send="true">https://opensips.org/html/docs/modules/3.2.x/proto_tcp.html#idp168992</a><br>
<br>
Regards,<br>
</font>
<pre cols="72">Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
<a href="https://www.opensips-solutions.com" target="_blank" moz-do-not-send="true">https://www.opensips-solutions.com</a>
OpenSIPS Summit 27-30 Sept 2022, Athens
<a href="https://www.opensips.org/events/Summit-2022Athens/" target="_blank" moz-do-not-send="true">https://www.opensips.org/events/Summit-2022Athens/</a></pre>
<div>On 9/13/22 12:01 PM, Yury Kirsanov
wrote:<br>
</div>
<blockquote type="cite">
<div dir="auto">Hi Bogdan,
<div dir="auto">Thanks for this update,
but it looks like I can't check
autoscaler because of this first issue
with blocking TCP connect. Is there a
way to resolve it? Am I doing
something wrong? Or is that something
to do with OpenSIPS code? As yes,
you're right, as soon as I restart
OpenSIPS having a lot of SIP devices
trying to connect to it - it goes
crazy, starts to consume memory and
stops to forward packets sitting there
at 100% load until it runs out of
memory and segfaults. Sometimes I
can't even restart it to come to
normal state to make it work, it just
loops into same crash whatever I try
to do.</div>
<div dir="auto"><br>
</div>
<div dir="auto">I've compiled OpenSIPS
3.3.1 with your patch and was able to
start it but not sure, maybe I was
just lucky this time.</div>
<div dir="auto"><br>
</div>
<div dir="auto">What should I do?
Thanks!</div>
<div dir="auto"><br>
</div>
<div dir="auto">Best regards,</div>
<div dir="auto">Yury.</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On
Tue, 13 Sept 2022, 18:56 Bogdan-Andrei
Iancu, <<a
href="mailto:bogdan@opensips.org"
target="_blank"
moz-do-not-send="true">bogdan@opensips.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div> Hi Yury,<br>
<br>
it looks like you some multiple
issues, overlapping here. The traps
you sent here have nothing to do
with the auto-scaling, but with a
blocking TCP connect for SIP - most
of the procs get blocked into a sync
TCP connect.<br>
<br>
Regards,<br>
<pre cols="72">Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
<a href="https://www.opensips-solutions.com" rel="noreferrer" target="_blank" moz-do-not-send="true">https://www.opensips-solutions.com</a>
OpenSIPS Summit 27-30 Sept 2022, Athens
<a href="https://www.opensips.org/events/Summit-2022Athens/" rel="noreferrer" target="_blank" moz-do-not-send="true">https://www.opensips.org/events/Summit-2022Athens/</a></pre>
<div>On 9/12/22 4:39 PM, Yury
Kirsanov wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi Bogdan,
<div>I've applied the patch (had
to find where to apply it
manually for 3.2.8 downloaded
from Web page, line 1568
instead of 1652) and restarted
the server with only about
300-350 SIP devices and
immediately got into same
issue. I'm attaching two GDB
dumps made within several
minutes from each other.
Autoscale was now OFF, please
see my previous message as
currently for some reason I'm
experiencing lockups even when
it's off :(</div>
</div>
</blockquote>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>Best regards,</div>
<div>Yury.</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr"
class="gmail_attr">On Mon, Sep
12, 2022 at 7:48 PM
Bogdan-Andrei Iancu <<a
href="mailto:bogdan@opensips.org"
rel="noreferrer"
target="_blank"
moz-do-not-send="true">bogdan@opensips.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div> <font face="monospace">Hi
Yuri,<br>
<br>
Could you give this patch
a try? it should fix the
blocking you experience
(it should apply on 3.2
too).<br>
<br>
Best regards,<br>
</font>
<pre cols="72">Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
<a href="https://www.opensips-solutions.com" rel="noreferrer" target="_blank" moz-do-not-send="true">https://www.opensips-solutions.com</a>
OpenSIPS Summit 27-30 Sept 2022, Athens
<a href="https://www.opensips.org/events/Summit-2022Athens/" rel="noreferrer" target="_blank" moz-do-not-send="true">https://www.opensips.org/events/Summit-2022Athens/</a></pre>
<div>On 9/7/22 2:54 PM,
Bogdan-Andrei Iancu wrote:<br>
</div>
<blockquote type="cite"> <font
face="monospace">Hi
Yury,<br>
<br>
Thanks for the details
info here - let me do a
review of some code and
run some tests, as at
this point I have a good
idea on the direction to
dig into.<br>
<br>
I will update here.<br>
<br>
Best regards,<br>
</font>
<pre cols="72">Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
<a href="https://www.opensips-solutions.com" rel="noreferrer" target="_blank" moz-do-not-send="true">https://www.opensips-solutions.com</a>
OpenSIPS Summit 27-30 Sept 2022, Athens
<a href="https://www.opensips.org/events/Summit-2022Athens/" rel="noreferrer" target="_blank" moz-do-not-send="true">https://www.opensips.org/events/Summit-2022Athens/</a></pre>
<div>On 9/6/22 11:24 AM,
Yury Kirsanov wrote:<br>
</div>
<blockquote type="cite">
<div dir="auto">Hi
Bogdan,
<div dir="auto">Yes,
I'm listening on all
types of sockets
including UDP, TCP
and TLS on the
outside public
interface and then
forward traffic into
internal LAN via UDP
only.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Previously
it was getting stuck
quite easily, now I
had to wait for a
while before this
actually happened.
I've routed part of
my customers to this
server to obtain
this result so I
will have to do that
again.</div>
<div dir="auto"><br>
</div>
<div dir="auto">As
soon as I see one of
the processes stuck
I'll dot the trap
command and send you
all the details
including processes
load, ps output and
so on.</div>
<div dir="auto"><br>
</div>
<div dir="auto">For
now I had to switch
autoscaling off and
just create many
listeners. Do I
understand correctly
that I need to
restart OpenSIPS in
order to apply
autoscaling profiles
and reload-routes is
not sufficient?</div>
<div dir="auto"><br>
</div>
<div dir="auto">Also,
do I need separate
UDP profiles for
public and private
interfaces? And do I
need to apply
autoscaling profile
just to a socket or
I need to specify
udp or tcp_workers
with autoscaler too?</div>
<div dir="auto"><br>
</div>
<div dir="auto">Thanks
and best regards,</div>
<div dir="auto">Yury.</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr"
class="gmail_attr">On
Tue, 6 Sept 2022,
18:18 Bogdan-Andrei
Iancu, <<a
href="mailto:bogdan@opensips.org"
rel="noreferrer"
target="_blank"
moz-do-not-send="true">bogdan@opensips.org</a>>
wrote:<br>
</div>
<blockquote
class="gmail_quote"
style="margin:0px
0px 0px
0.8ex;border-left:1px
solid
rgb(204,204,204);padding-left:1ex">
<div> <font
face="monospace">Hi
Yury,<br>
<br>
Thanks for the
info. I see that
the stuck
process (24) is
an auto-scalled
one (based on
its id). Do you
have SIP traffic
from UDP to TCP
or doing some
HEP capturing
for SIP ? I saw
a recent similar
report where a
UDP auto-scalled
worked got stuck
when trying to
do some
communication
with the TCP
main/manager
process (in
order to handle
a TCP
operation).<br>
<br>
BTW, any chance
to do a
"opensips-cli -x
trap" when you
have that stuck
process, just to
see where is it
stuck? and is it
hard to
reproduce? as I
may ask you to
extract some
information from
the running
process....<br>
<br>
Regards,<br>
</font>
<pre cols="72">Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
<a href="https://www.opensips-solutions.com" rel="noreferrer noreferrer" target="_blank" moz-do-not-send="true">https://www.opensips-solutions.com</a>
OpenSIPS Summit 27-30 Sept 2022, Athens
<a href="https://www.opensips.org/events/Summit-2022Athens/" rel="noreferrer noreferrer" target="_blank" moz-do-not-send="true">https://www.opensips.org/events/Summit-2022Athens/</a></pre>
<div>On 9/3/22
6:54 PM, Yury
Kirsanov wrote:<br>
</div>
</div>
</blockquote>
</div>
</blockquote>
<br>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
Users mailing list
<a href="mailto:Users@lists.opensips.org" rel="noreferrer" target="_blank" moz-do-not-send="true">Users@lists.opensips.org</a>
<a href="http://lists.opensips.org/cgi-bin/mailman/listinfo/users" rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.opensips.org/cgi-bin/mailman/listinfo/users</a>
</pre>
</blockquote>
<br>
</div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</blockquote>
</div>
</blockquote>
<br>
</body>
</html>