<div dir="ltr">Hi Bogdan,<div>Looks like my problem was quite complex as I had following issues:</div><div><br></div><div>1. tcp_async was off</div><div>2. TCP timeouts were set to be very high</div><div><br></div><div>I've tried to just enable tcp_async and that didn't help - after restart and TCP SYN storm OpenSIPS started to consume memory and processes got locked up again. Then I started to tune other parameters. Here's like it was before:</div><div><br></div><div># Proto TCP<br>loadmodule "proto_tcp.so"<br>modparam("proto_tcp", "tcp_async", 1)<br>modparam("proto_tcp", "tcp_send_timeout", 5000)<br>modparam("proto_tcp", "tcp_async_local_connect_timeout", 5000)<br>modparam("proto_tcp", "tcp_async_local_write_timeout", 5000)<br>modparam("proto_tcp", "tcp_max_msg_chunks", 16)<br><br></div><div>I had a very high tcp_send_timout because some of our customers are connecting from across the globe and have high latency times, of course that's not 5 seconds but I set it that high just to make sure they will be able to connect. Now I ended up with this config:</div><div><br></div><div># Proto TCP<br>loadmodule "proto_tcp.so"<br>modparam("proto_tcp", "tcp_async", 1)<br>modparam("proto_tcp", "tcp_send_timeout", 1000)<br>modparam("proto_tcp", "tcp_async_local_connect_timeout", 500)<br>modparam("proto_tcp", "tcp_async_local_write_timeout", 500)<br>modparam("proto_tcp", "tcp_max_msg_chunks", 16)<br>modparam("proto_tcp", "tcp_parallel_handling", 1)<br></div><div><br></div><div>And looks like OpenSIPS is now able to survive restarts!</div><div><br></div><div>One more thing I tried before was to rate-limit TCP connections on iptables - that also helped even in my incorrect configuration and with blocking TCP mode. I rate-limited TCP SYN packets on my public interface on TCP ports that go to OpenSIPS using iptables rate-limit module with 10 packets per second and 50 packets burst - that also seemed to help. This can be adjusted as required depending on new connections load. Hope this helps someone who would run into the same troubles!</div><div><br></div><div>I will continue monitoring ourĀ OpenSIPS instances and if everything works fine after restart I will enable auto-scaler to test it with the new patch.</div><div><br></div><div>Thanks a lot for your help, Bogdan, that's much appreciated!</div><div><br></div><div>Best regards,</div><div>Yury.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Sep 15, 2022 at 1:22 AM Yury Kirsanov <<a href="mailto:y.kirsanov@gmail.com">y.kirsanov@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi Bogdan,<div>Thanks for your answer, I've checked my configs and yes, for some reason I had tcp_async off!!! I will definitely switch it on for now and then give it a try!!! Can't believe I missed that one!!!</div><div><br></div><div>Best regards,</div><div>Yury.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Sep 14, 2022 at 10:58 PM Bogdan-Andrei Iancu <<a href="mailto:bogdan@opensips.org" target="_blank">bogdan@opensips.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    <font face="monospace">Hi Yury,<br>
      <br>
      You need to check the TCP setting and to be sure your OpenSIPS
      will (1) not try to perform TCP connect against destination known
      not to be able to accept (like TCP/WS end points behind NAT) - see
      the tcp_no_new_conn_bflag [1] - or (2) not block for long time
      while attempting a connect - see the tcp_connect_timeout [2] or
      consider enabling async [3].<br>
      <br>
      [1]
<a href="https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_no_new_conn_bflag" target="_blank">https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_no_new_conn_bflag</a><br>
      [2]
<a href="https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_connect_timeout" target="_blank">https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_connect_timeout</a><br>
      [3]
      <a href="https://opensips.org/html/docs/modules/3.2.x/proto_tcp.html#idp168992" target="_blank">https://opensips.org/html/docs/modules/3.2.x/proto_tcp.html#idp168992</a><br>
      <br>
      Regards,<br>
    </font>
    <pre cols="72">Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  <a href="https://www.opensips-solutions.com" target="_blank">https://www.opensips-solutions.com</a>
OpenSIPS Summit 27-30 Sept 2022, Athens
  <a href="https://www.opensips.org/events/Summit-2022Athens/" target="_blank">https://www.opensips.org/events/Summit-2022Athens/</a></pre>
    <div>On 9/13/22 12:01 PM, Yury Kirsanov
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="auto">Hi Bogdan,
        <div dir="auto">Thanks for this update, but it looks like I
          can't check autoscaler because of this first issue with
          blocking TCP connect. Is there a way to resolve it? Am I doing
          something wrong? Or is that something to do with OpenSIPS
          code? As yes, you're right, as soon as I restart OpenSIPS
          having a lot of SIP devices trying to connect to it - it goes
          crazy, starts to consume memory and stops to forward packets
          sitting there at 100% load until it runs out of memory and
          segfaults. Sometimes I can't even restart it to come to normal
          state to make it work, it just loops into same crash whatever
          I try to do.</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">I've compiled OpenSIPS 3.3.1 with your patch and
          was able to start it but not sure, maybe I was just lucky this
          time.</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">What should I do? Thanks!</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">Best regards,</div>
        <div dir="auto">Yury.</div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Tue, 13 Sept 2022, 18:56
          Bogdan-Andrei Iancu, <<a href="mailto:bogdan@opensips.org" target="_blank">bogdan@opensips.org</a>> wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div> Hi Yury,<br>
            <br>
            it looks like you some multiple issues, overlapping here.
            The traps you sent here have nothing to do with the
            auto-scaling, but with a blocking TCP connect for SIP - most
            of the procs get blocked into a sync TCP connect.<br>
            <br>
            Regards,<br>
            <pre cols="72">Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  <a href="https://www.opensips-solutions.com" rel="noreferrer" target="_blank">https://www.opensips-solutions.com</a>
OpenSIPS Summit 27-30 Sept 2022, Athens
  <a href="https://www.opensips.org/events/Summit-2022Athens/" rel="noreferrer" target="_blank">https://www.opensips.org/events/Summit-2022Athens/</a></pre>
            <div>On 9/12/22 4:39 PM, Yury Kirsanov wrote:<br>
            </div>
            <blockquote type="cite">
              <div dir="ltr">Hi Bogdan,
                <div>I've applied the patch (had to find where to apply
                  it manually for 3.2.8 downloaded from Web page, line
                  1568 instead of 1652) and restarted the server with
                  only about 300-350 SIP devices and immediately got
                  into same issue. I'm attaching two GDB dumps made
                  within several minutes from each other. Autoscale was
                  now OFF, please see my previous message as currently
                  for some reason I'm experiencing lockups even when
                  it's off :(</div>
              </div>
            </blockquote>
            <br>
            <blockquote type="cite">
              <div dir="ltr">
                <div>Best regards,</div>
                <div>Yury.</div>
              </div>
              <br>
              <div class="gmail_quote">
                <div dir="ltr" class="gmail_attr">On Mon, Sep 12, 2022
                  at 7:48 PM Bogdan-Andrei Iancu <<a href="mailto:bogdan@opensips.org" rel="noreferrer" target="_blank">bogdan@opensips.org</a>>
                  wrote:<br>
                </div>
                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                  <div> <font face="monospace">Hi Yuri,<br>
                      <br>
                      Could you give this patch a try? it should fix the
                      blocking you experience (it should apply on 3.2
                      too).<br>
                      <br>
                      Best regards,<br>
                    </font>
                    <pre cols="72">Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  <a href="https://www.opensips-solutions.com" rel="noreferrer" target="_blank">https://www.opensips-solutions.com</a>
OpenSIPS Summit 27-30 Sept 2022, Athens
  <a href="https://www.opensips.org/events/Summit-2022Athens/" rel="noreferrer" target="_blank">https://www.opensips.org/events/Summit-2022Athens/</a></pre>
                    <div>On 9/7/22 2:54 PM, Bogdan-Andrei Iancu wrote:<br>
                    </div>
                    <blockquote type="cite"> <font face="monospace">Hi
                        Yury,<br>
                        <br>
                        Thanks for the details info here - let me do a
                        review of some code and run some tests, as at
                        this point I have a good idea on the direction
                        to dig into.<br>
                        <br>
                        I will update here.<br>
                        <br>
                        Best regards,<br>
                      </font>
                      <pre cols="72">Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  <a href="https://www.opensips-solutions.com" rel="noreferrer" target="_blank">https://www.opensips-solutions.com</a>
OpenSIPS Summit 27-30 Sept 2022, Athens
  <a href="https://www.opensips.org/events/Summit-2022Athens/" rel="noreferrer" target="_blank">https://www.opensips.org/events/Summit-2022Athens/</a></pre>
                      <div>On 9/6/22 11:24 AM, Yury Kirsanov wrote:<br>
                      </div>
                      <blockquote type="cite">
                        <div dir="auto">Hi Bogdan,
                          <div dir="auto">Yes, I'm listening on all
                            types of sockets including UDP, TCP and TLS
                            on the outside public interface and then
                            forward traffic into internal LAN via UDP
                            only.</div>
                          <div dir="auto"><br>
                          </div>
                          <div dir="auto">Previously it was getting
                            stuck quite easily, now I had to wait for a
                            while before this actually happened. I've
                            routed part of my customers to this server
                            to obtain this result so I will have to do
                            that again.</div>
                          <div dir="auto"><br>
                          </div>
                          <div dir="auto">As soon as I see one of the
                            processes stuck I'll dot the trap command
                            and send you all the details including
                            processes load, ps output and so on.</div>
                          <div dir="auto"><br>
                          </div>
                          <div dir="auto">For now I had to switch
                            autoscaling off and just create many
                            listeners. Do I understand correctly that I
                            need to restart OpenSIPS in order to apply
                            autoscaling profiles and reload-routes is
                            not sufficient?</div>
                          <div dir="auto"><br>
                          </div>
                          <div dir="auto">Also, do I need separate UDP
                            profiles for public and private interfaces?
                            And do I need to apply autoscaling profile
                            just to a socket or I need to specify udp or
                            tcp_workers with autoscaler too?</div>
                          <div dir="auto"><br>
                          </div>
                          <div dir="auto">Thanks and best regards,</div>
                          <div dir="auto">Yury.</div>
                        </div>
                        <br>
                        <div class="gmail_quote">
                          <div dir="ltr" class="gmail_attr">On Tue, 6
                            Sept 2022, 18:18 Bogdan-Andrei Iancu, <<a href="mailto:bogdan@opensips.org" rel="noreferrer" target="_blank">bogdan@opensips.org</a>>
                            wrote:<br>
                          </div>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                            <div> <font face="monospace">Hi Yury,<br>
                                <br>
                                Thanks for the info. I see that the
                                stuck process (24) is an auto-scalled
                                one (based on its id). Do you have SIP
                                traffic from UDP to TCP or doing some
                                HEP capturing for SIP ? I saw a recent
                                similar report where a UDP auto-scalled
                                worked got stuck when trying to do some
                                communication with the TCP main/manager
                                process (in order to handle a TCP
                                operation).<br>
                                <br>
                                BTW, any chance to do a "opensips-cli -x
                                trap" when you have that stuck process,
                                just to see where is it stuck? and is it
                                hard to reproduce? as I may ask you to
                                extract some information from the
                                running process....<br>
                                <br>
                                Regards,<br>
                              </font>
                              <pre cols="72">Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  <a href="https://www.opensips-solutions.com" rel="noreferrer noreferrer" target="_blank">https://www.opensips-solutions.com</a>
OpenSIPS Summit 27-30 Sept 2022, Athens
  <a href="https://www.opensips.org/events/Summit-2022Athens/" rel="noreferrer noreferrer" target="_blank">https://www.opensips.org/events/Summit-2022Athens/</a></pre>
                              <div>On 9/3/22 6:54 PM, Yury Kirsanov
                                wrote:<br>
                              </div>
                            </div>
                          </blockquote>
                        </div>
                      </blockquote>
                      <br>
                      <br>
                      <fieldset></fieldset>
                      <pre>_______________________________________________
Users mailing list
<a href="mailto:Users@lists.opensips.org" rel="noreferrer" target="_blank">Users@lists.opensips.org</a>
<a href="http://lists.opensips.org/cgi-bin/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.opensips.org/cgi-bin/mailman/listinfo/users</a>
</pre>
                    </blockquote>
                    <br>
                  </div>
                </blockquote>
              </div>
            </blockquote>
            <br>
          </div>
        </blockquote>
      </div>
    </blockquote>
    <br>
  </div>

</blockquote></div>
</blockquote></div>