<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:758982242;
mso-list-type:hybrid;
mso-list-template-ids:-1638617650 -1137403934 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l0:level1
{mso-level-start-at:0;
mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;
mso-fareast-font-family:Calibri;
mso-bidi-font-family:"Times New Roman";}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l1
{mso-list-id:1913540724;
mso-list-type:hybrid;
mso-list-template-ids:1544867480 67698689 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l1:level1
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l1:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l1:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l1:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l1:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l1:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l1:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l1:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l1:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style>
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Hi,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">We have implemented TCP recently and are performing TCP<->UDP translation on one of our proxy types. This proxy only exists for that purpose; there are no DB queries, REST calls, or anything like that. It
is designed to be very fast and high throughput.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Recently we have found that when the remote endpoint of a TCP connection is lost, i.e. the server goes down, while under moderate load OpenSIPS quickly reaches 100% CPU and becomes unresponsive. When this
occurs, the “top” command shows that between 30-90% CPU is in System (kernel) space, and each OpenSIPS TCP process shows many times the normal CPU. We are running OpenSIPS 2.4.2 on Amazon Linux.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I obtained as much information as I could using ps, strace, and gdb here: https://pastebin.com/JP3DnCqs. We can reproduce the failure consistently by removing a server during call traffic.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">A few things I noticed:<o:p></o:p></span></p>
<ul style="margin-top:0in" type="disc">
<li class="MsoListParagraph" style="margin-left:0in;mso-list:l0 level1 lfo2"><span style="font-size:11.0pt">The number of running threads reported by OpenSIPS doesn’t align with our configuration, copied here:<o:p></o:p></span></li></ul>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">####### Global Parameters #########<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">children=32<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">#// Allow 503 to pass back to Control<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">disable_503_translation=yes<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">#// Even though we are not receiving HEP,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">#// this listener is required by OpenSIPS<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">#// in order to use the proto_hep module. :/<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">listen=hep_tcp:10.32.40.245:9061 use_children 1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">#// Configure the listeners<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">listen=udp:10.32.40.245:5060 as XXX.XXX.XXX.XXX<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">listen=tcp:10.32.40.245:5060 as XXX.XXX.XXX.XXX<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">#// Transaction Module<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">loadmodule "tm.so"<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">modparam("tm", "restart_fr_on_each_reply", 0)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">modparam("tm", "timer_partitions", 8)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">modparam("tm", "onreply_avp_mode", 1)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">modparam("tm", "wt_timer", 10)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoListParagraph"><span style="font-size:11.0pt">According to the documentation if “tcp_children” is not set then the value of “children” will be used [1], but we have set “children” to 32 and only have the default 8 TCP processes. Also we appear
to only have 1 timer process, although we have set the number of timer partitions to 8.<o:p></o:p></span></p>
<ul style="margin-top:0in" type="disc">
<li class="MsoListParagraph" style="margin-left:0in;mso-list:l0 level1 lfo2"><span style="font-size:11.0pt">The server that is terminated was using TCP connections exclusively, but all of the CPU seems to be in the UDP threads. The one I looked at appeared
to be handling a CANCEL to one of the calls that was active and was attempting to send it out via TCP. I’m not sure why it would be trying to relay the CANCEL as no 100 Trying had been received from the server. I have noticed that in 2.x OpenSIPS will now
send CANCELs for transactions even when 100 Trying was not received. Is that intentional? RFC 3261 states that no CANCEL should be sent unless a provisional response has been received.<o:p></o:p></span></li></ul>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Any assistance with this would be appreciated.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">[1] - <a href="http://www.opensips.org/Documentation/Script-CoreParameters-2-4#toc66">
http://www.opensips.org/Documentation/Script-CoreParameters-2-4#toc66</a><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">Ben Newlin </span>
<o:p></o:p></p>
</div>
</body>
</html>