<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<tt>Hi Ben,<br>
<br>
<tt>Thank you f<tt>or the info.<br>
<br>
<tt><tt>It looks like <tt>the<tt> processes get stuck into a
HEP related internal lock - do you see any HEP related
errors in<tt> your logs, p<tt>rior to the dead-lock ?<br>
<br>
Also, as <tt>PoC<tt>, could you <tt>disabled <tt>HEP
tracing to see if the problem goes away ?<br>
<br>
<tt>Than<tt>ks,</tt></tt><br>
</tt></tt></tt></tt><br>
</tt></tt></tt></tt></tt></tt></tt></tt></tt>
<pre class="moz-signature" cols="72">Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
<a class="moz-txt-link-freetext" href="http://www.opensips-solutions.com">http://www.opensips-solutions.com</a>
OpenSIPS Bootcamp 2018
<a class="moz-txt-link-freetext" href="http://opensips.org/training/OpenSIPS_Bootcamp_2018/">http://opensips.org/training/OpenSIPS_Bootcamp_2018/</a>
</pre>
<div class="moz-cite-prefix">On 10/24/2018 10:18 PM, Ben Newlin
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:DA4917E0-F007-4BCB-AF66-ACDC70BA819F@genesys.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";}
tt
{mso-style-priority:99;
font-family:"Courier New";}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Consolas;}
span.EmailStyle22
{mso-style-type:personal;
font-family:"Calibri",sans-serif;}
span.EmailStyle23
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:758982242;
mso-list-type:hybrid;
mso-list-template-ids:-1638617650 -1137403934 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l0:level1
{mso-level-start-at:0;
mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;
mso-fareast-font-family:Calibri;
mso-bidi-font-family:"Times New Roman";}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l1
{mso-list-id:1348478809;
mso-list-template-ids:-2097232230;}
@list l1:level1
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level2
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:1.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:2.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level5
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:2.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:3.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:3.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level8
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:4.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:4.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l2
{mso-list-id:1505051337;
mso-list-template-ids:-1624983752;}
@list l2:level1
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l2:level2
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l2:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:1.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l2:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:2.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l2:level5
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:2.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l2:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:3.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l2:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:3.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l2:level8
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:4.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l2:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:4.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style>
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Bogdan,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I have run
the command but the output was too large for pastebin so I
have sent it to you directly.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">Ben
Newlin </span>
<span style="font-size:11.0pt"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<div style="border:none;border-top:solid #B5C4DF
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="color:black">From: </span></b><span
style="color:black">Bogdan-Andrei Iancu
<a class="moz-txt-link-rfc2396E" href="mailto:bogdan@opensips.org"><bogdan@opensips.org></a><br>
<b>Date: </b>Wednesday, October 24, 2018 at 5:17 AM<br>
<b>To: </b>OpenSIPS users mailling list
<a class="moz-txt-link-rfc2396E" href="mailto:users@lists.opensips.org"><users@lists.opensips.org></a>, Ben Newlin
<a class="moz-txt-link-rfc2396E" href="mailto:Ben.Newlin@genesys.com"><Ben.Newlin@genesys.com></a><br>
<b>Subject: </b>Re: [OpenSIPS-Users] CPU 100% with TCP<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
</div>
<p class="MsoNormal"><tt><span style="font-size:10.0pt">Hi Ben,</span></tt><span
style="font-size:10.0pt;font-family:"Courier New""><br>
<br>
<tt>Could you run "opensipsctl trap" ?</tt><br>
<br>
<tt>Regards,</tt><br>
<br>
</span><o:p></o:p></p>
<pre>Bogdan-Andrei Iancu<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>OpenSIPS Founder and Developer<o:p></o:p></pre>
<pre> <a href="http://www.opensips-solutions.com" moz-do-not-send="true">http://www.opensips-solutions.com</a><o:p></o:p></pre>
<pre>OpenSIPS Bootcamp 2018<o:p></o:p></pre>
<pre> <a href="http://opensips.org/training/OpenSIPS_Bootcamp_2018/" moz-do-not-send="true">http://opensips.org/training/OpenSIPS_Bootcamp_2018/</a><o:p></o:p></pre>
<div>
<p class="MsoNormal">On 10/24/2018 12:56 AM, Ben Newlin wrote:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"><span style="font-size:11.0pt">Hi,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">We have
implemented TCP recently and are performing
TCP<->UDP translation on one of our proxy types.
This proxy only exists for that purpose; there are no DB
queries, REST calls, or anything like that. It is designed
to be very fast and high throughput.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Recently
we have found that when the remote endpoint of a TCP
connection is lost, i.e. the server goes down, while under
moderate load OpenSIPS quickly reaches 100% CPU and
becomes unresponsive. When this occurs, the “top” command
shows that between 30-90% CPU is in System (kernel) space,
and each OpenSIPS TCP process shows many times the normal
CPU. We are running OpenSIPS 2.4.2 on Amazon Linux.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I obtained
as much information as I could using ps, strace, and gdb
here:
<a href="https://pastebin.com/JP3DnCqs"
moz-do-not-send="true">
https://pastebin.com/JP3DnCqs</a>. We can reproduce the
failure consistently by removing a server during call
traffic.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">A few
things I noticed:</span><o:p></o:p></p>
<ul style="margin-top:0in" type="disc">
<li class="MsoListParagraph"
style="margin-left:0in;mso-list:l0 level1 lfo3"><span
style="font-size:11.0pt">The number of running threads
reported by OpenSIPS doesn’t align with our
configuration, copied here:</span>
<o:p></o:p></li>
</ul>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New"">####### Global Parameters #########</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New""> </span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New"">children=32</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New""> </span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New"">#// Allow 503 to pass back to Control</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New"">disable_503_translation=yes</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New""> </span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New"">#// Even though we are not receiving HEP,</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New"">#// this listener is required by OpenSIPS</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New"">#// in order to use the proto_hep module. :/</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New"">listen=hep_tcp:10.32.40.245:9061 use_children 1</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New""> </span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New"">#// Configure the listeners</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New"">listen=udp:10.32.40.245:5060 as XXX.XXX.XXX.XXX</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New"">listen=tcp:10.32.40.245:5060 as XXX.XXX.XXX.XXX</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New""> </span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New"">#// Transaction Module</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New"">loadmodule "tm.so"</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New"">modparam("tm", "restart_fr_on_each_reply", 0)</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New"">modparam("tm", "timer_partitions", 8)</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New"">modparam("tm", "onreply_avp_mode", 1)</span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:8.0pt;font-family:"Courier
New"">modparam("tm", "wt_timer", 10)</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoListParagraph"><span style="font-size:11.0pt">According
to the documentation if “tcp_children” is not set then the
value of “children” will be used [1], but we have set
“children” to 32 and only have the default 8 TCP
processes. Also we appear to only have 1 timer process,
although we have set the number of timer partitions to 8.</span><o:p></o:p></p>
<ul style="margin-top:0in" type="disc">
<li class="MsoListParagraph"
style="margin-left:0in;mso-list:l0 level1 lfo3"><span
style="font-size:11.0pt">The server that is terminated
was using TCP connections exclusively, but all of the
CPU seems to be in the UDP threads. The one I looked at
appeared to be handling a CANCEL to one of the calls
that was active and was attempting to send it out via
TCP. I’m not sure why it would be trying to relay the
CANCEL as no 100 Trying had been received from the
server. I have noticed that in 2.x OpenSIPS will now
send CANCELs for transactions even when 100 Trying was
not received. Is that intentional? RFC 3261 states that
no CANCEL should be sent unless a provisional response
has been received.</span>
<o:p></o:p></li>
</ul>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Any
assistance with this would be appreciated.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">[1] - <a
href="http://www.opensips.org/Documentation/Script-CoreParameters-2-4#toc66"
moz-do-not-send="true">
http://www.opensips.org/Documentation/Script-CoreParameters-2-4#toc66</a></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;color:black">Ben Newlin </span>
<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><br>
<br>
<br>
<o:p></o:p></span></p>
<pre>_______________________________________________<o:p></o:p></pre>
<pre>Users mailing list<o:p></o:p></pre>
<pre><a href="mailto:Users@lists.opensips.org" moz-do-not-send="true">Users@lists.opensips.org</a><o:p></o:p></pre>
<pre><a href="http://lists.opensips.org/cgi-bin/mailman/listinfo/users" moz-do-not-send="true">http://lists.opensips.org/cgi-bin/mailman/listinfo/users</a><o:p></o:p></pre>
</blockquote>
<p class="MsoNormal"><span style="font-size:11.0pt"><br>
<br>
<o:p></o:p></span></p>
</div>
</blockquote>
<br>
</body>
</html>