<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Yes, we are bridging between TCP and
UDP. There are no custom scripts being used, just the core
OpenSIPS scripting, with OpenSIPS acting just as a proxy. We had
another event occur last evening and managed to capture some
interesting information that suggests the problem is related to
blocking tcp connections:<br>
<br>
aaa@ip-10-72-7-129:/dir$ grep "ERROR:core:tcp_blocking_connect:
timeout 10 s elapsed from 10 s" opensips.log | sed 's/:.. ip.*//'|
uniq -c<br>
2 Nov 14 06:50<br>
2 Nov 14 06:53<br>
2 Nov 14 06:55<br>
1 Nov 14 06:56<br>
2 Nov 14 06:57<br>
2 Nov 14 07:00<br>
....<br>
2 Nov 14 13:53<br>
3 Nov 14 13:54<br>
2 Nov 14 13:55<br>
<font color="#ff0000"><font color="#000000"> 23 Nov 14 13:56<br>
29 Nov 14 13:57<br>
30 Nov 14 13:58<br>
30 Nov 14 13:59</font><br>
<br>
<font color="#000000">While I'm not familiar with the inner
workings of OpenSIPS, it seems like there is something
happening that may be causing all of the child processes to
block while trying to establish outbound TCP connections.
Unfortunately the "tcp_blocking_connect: timeout" error (and
the subsequent "tcp_blocking_connect failed" error) doesn't
indicate the address that caused the failure, and there's no
way that I can see to correlate the error back to a request
being processed. But I suspect it's a case where a client
connected via TCP, registered with our registrar (where the
path to the client, including the address of the TCP
connection endpoint on OpenSIPS that has an ephemeral port, is
recorded), but when we go to send a bunch of requests to that
client it is no longer there and it causes all of the opensips
child processes to effectively block trying to establish a
connection to an address it will never be able to connect to.
In the end 477 Send Failed is returned in those cases, but I
think the child trying to establish the connection is blocked
until the connect times out, and things start piling up behind
it, leading to a vicious circle.<br>
<br>
Assuming all of the analysis is correct, can we prevent
OpenSIPS from spending time trying to establish a connection
that can never be established?<br>
<br>
Thanks,<br>
<br>
Gavin</font><br>
<br>
</font>On 15/11/2013 7:00 AM, <a class="moz-txt-link-abbreviated" href="mailto:users-request@lists.opensips.org">users-request@lists.opensips.org</a>
wrote:<br>
</div>
<blockquote
cite="mid:mailman.3.1384513202.28824.users@lists.opensips.org"
type="cite">
<pre wrap="">Seems opensips is acting as transport bridge between udp and tcp, so,
do the SIP replies actually arrive at same interface (transport + ip +
port) from where the request was sent, and opensips is listening to that
interface? Can you share SIP trace with us?
Also, are you using any custom application, e.g. perl / lua script etc.
in opensips dial plan, that might be blocking the opensips thread that
is managing this transaction. As far as i remember opensips 1.8.x does
not have async processing, therefore, if any script or module command
that consumes time e.g. custom sql query, external script or shell
command etc. would block opensips and any responses received during that
time are likely to be ignored.
Thank you.
On 2013-11-14 16:05, Gavin Murphy wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Hi all,
We're seeing a possible issue in OpenSIPS related to the timely
processing of replies. For example, OpenSIPS receives a REGISTER and
passes it on to our application server (which is the registrar). The
registrar receives it within the same second and generates the
response (a 401 initially) within 0.003 seconds. However OpenSIPS
doesn't appear to receive that message, and half a second after the
first REGISTER is sent, there is a retransmission after 0.5 second,
which is again received by our registrar and a response is generated
very quickly. But still the 401 doesn't get received and/or processed
by OpenSIPS. The retransmission happens again a few times as per RFC
3261. Eventually, almost 30 seconds later OpenSIPS logs that the
reply
to the REGISTER has been received.
Based on the evidence it seems that there is no problem with OpenSIPS
sending the request and it being received by our registrar. There
also
doesn't appear to be any issues with the retransmission by OpenSIPS
at
the right intervals, nor does the registrar appear to be introducing
any delays. There is not much other traffic going on at the same
time,
but there are other REGISTERs that are getting the same result.
Here are some logs from OpenSIPS:
Initial relay to our registrar:
Nov 1 19:14:40 ip-10-72-7-129 rcs-opensips[13839]: RCS INFO:
ROUTE[2] (Inbound and Outbound) - REGISTER REQUEST relayed. Exiting.
Nov 1 19:14:40 ip-10-72-7-129 rcs-opensips[13839]: DBG:tm:t_unref:
UNREF_UNSAFE: [0x7f8f8b8d6518] after is 0
Nov 1 19:14:40 ip-10-72-7-129 rcs-opensips[13839]:
DBG:core:destroy_avp_list: destroying list (nil)
Retransmissions:
Nov 1 19:14:40 ip-10-72-7-129 rcs-opensips[13836]:
DBG:tm:retransmission_handler: retransmission_handler : request
resending (t=0x7f8f8b8d6518, REGISTER ... )
Nov 1 19:14:41 ip-10-72-7-129 rcs-opensips[13836]:
DBG:tm:retransmission_handler: retransmission_handler : request
resending (t=0x7f8f8b8d6518, REGISTER ... )
Nov 1 19:14:43 ip-10-72-7-129 rcs-opensips[13836]:
DBG:tm:retransmission_handler: retransmission_handler : request
resending (t=0x7f8f8b8d6518, REGISTER ... )
Nov 1 19:14:53 ip-10-72-7-129 rcs-opensips[13836]:
DBG:tm:retransmission_handler: retransmission_handler : request
resending (t=0x7f8f8b8d6518, REGISTER ... )
Nov 1 19:15:04 ip-10-72-7-129 rcs-opensips[13836]:
DBG:tm:retransmission_handler: retransmission_handler : request
resending (t=0x7f8f8b8d6518, REGISTER ... )
Reply "received" (IP,s TNs, and domains anonymized):
Nov 1 19:15:08 ip-10-72-7-129 rcs-opensips[13831]:
DBG:tm:t_reply_matching: REF_UNSAFE:[0x7f8f8b8d6518] after is 1
Nov 1 19:15:08 ip-10-72-7-129 rcs-opensips[13831]:
DBG:tm:t_reply_matching: reply matched (T=0x7f8f8b8d6518)!
Nov 1 19:15:08 ip-10-72-7-129 rcs-opensips[13831]: DBG:tm:t_check:
end=0x7f8f8b8d6518
Nov 1 19:15:08 ip-10-72-7-129 rcs-opensips[13831]:
DBG:tm:reply_received: org. status uas=0, uac[0]=0 local=0
is_invite=0)
Nov 1 19:15:08 ip-10-72-7-129 rcs-opensips[13831]: RCS INFO:
PROCESSING ONREPLY ROUTE[2] (Reply for Inbound Request)
Nov 1 19:15:08 ip-10-72-7-129 rcs-opensips[13831]: RCS DEBUG:
ONREPLY_ROUTE[1] (Reply for Inbound Request) - Dump Request Info.
Nov 1 19:15:08 ip-10-72-7-129 rcs-opensips[13831]: RCS DEBUG:
============= DUMP REPLY =============
Nov 1 19:15:08 ip-10-72-7-129 rcs-opensips[13831]: RCS DEBUG: SIP
message buffer:#012SIP/2.0 401 Unauthorized#015#012Via: SIP/2.0/UDP
x.x.x.x:6000;branch=z9hG4bKf439.62091482.0;i=c04c3,SIP/2.0/TCP
x.x.x.x:40042;branch=z9hG4bK1490773660;received=x.x.x.x;rport=40042#015#012Record-Route:
<a class="moz-txt-link-rfc2396E" href="sip:x.x.x.x:6031;lr"><sip:x.x.x.x:6031;lr></a>#015#012Call-ID:
6d76ce9e-ed19-9701-1e75-f070eacd3400#015#012From:
<a class="moz-txt-link-rfc2396E" href="sip:+xxxxxxxxxxx@domain.net"><sip:+xxxxxxxxxxx@domain.net></a>;tag=330928025#015#012To:
<a class="moz-txt-link-rfc2396E" href="sip:+xxxxxxxxxxx@domain.net"><sip:+xxxxxxxxxxx@domain.net></a>;tag=Jmuh3YOvEmCA#015#012CSeq: 221716357
REGISTER#015#012Contact: <a class="moz-txt-link-rfc2396E" href="sip:x.x.x.x:6031"><sip:x.x.x.x:6031></a>#015#012Require:
gruu#015#012WWW-Authenticate: Digest
realm="domain.net",nonce="5273fda02d788dccba9301c8e1f68f078f8e95e5",qop="auth",opaque="004533235332435434ffac663e",algorithm=MD5#015#012Content-Length:
0#015#012P-Associated-URI:
<a class="moz-txt-link-rfc2396E" href="sip:+xxxxxxxxxx@domain.net"><sip:+xxxxxxxxxx@domain.net></a>#015#012P-Preferred-Identity:
<a class="moz-txt-link-rfc2396E" href="sip:+xxxxxxxxxx@domain.net"><sip:+xxxxxxxxxx@domain.net></a>#015#012P-Access-Network-Info:
ADSL;utran-cell-id-3gpp=00000000#015#012Privacy: none#015#012#015#012
On the registrar the logs appear as follows:
[2013-Nov-01 19:14:40.185708] [14002] [6-inf] ===RECV==> REGISTER
(6d76ce9e-ed19-9701-1e75-f070eacd3400:+xxxxxxxxxx => +xxxxxxxxxx)
[2013-Nov-01 19:14:40.188331] [14002] [6-inf] <==SENT=== 401
Unauthorized (6d76ce9e-ed19-9701-1e75-f070eacd3400:+xxxxxxxxxx =>
+xxxxxxxxxx)
[2013-Nov-01 19:14:40.721011] [14002] [6-inf] ===RECV==> REGISTER
(6d76ce9e-ed19-9701-1e75-f070eacd3400:+xxxxxxxxxx => +xxxxxxxxxx)
[2013-Nov-01 19:14:40.721162] [14002] [6-inf] <==SENT=== 401
Unauthorized (6d76ce9e-ed19-9701-1e75-f070eacd3400:+xxxxxxxxxx =>
+xxxxxxxxxx)
[2013-Nov-01 19:14:41.723160] [14002] [6-inf] ===RECV==> REGISTER
(6d76ce9e-ed19-9701-1e75-f070eacd3400:+xxxxxxxxxx => +xxxxxxxxxx)
[2013-Nov-01 19:14:41.723262] [14002] [6-inf] <==SENT=== 401
Unauthorized (6d76ce9e-ed19-9701-1e75-f070eacd3400:+xxxxxxxxxx =>
+xxxxxxxxxx)
etc....
This "bogging down" seems to happen on a regular basis, and the only
way we are able to resolve it at the moment is to restart OpenSIPS.
Until it is restarted all of the registrations are essentially
failing. I believe it is v1.8.2 that we are running.
Anyone have any thoughts as to what might be happening?
Thanks,
Gavin
</pre>
</blockquote>
<pre wrap="">
</pre>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
Gavin Murphy
Vice President & CTO | <a class="moz-txt-link-abbreviated" href="http://www.newpace.ca">www.newpace.ca</a> - Real Technology Solutions
(e) <a class="moz-txt-link-abbreviated" href="mailto:gavin.murphy@newpace.ca">gavin.murphy@newpace.ca</a>
(w) +1.902.406.8375 x 1002
(m) +1.902.401.9445
(f) +1.902.406.8377</pre>
</body>
</html>