<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#ffffff">
<tt>Hi Gavin,<br>
<br>
I see, no registration....As an exercise, increase the
tcp_connection_lifetime to 7200 (2 h), just to rule out the
possibility of connections timing out.<br>
</tt><tt><br>
Are you saying that running a constant load of 50K TCP conns (for
long time), does not result in any TCP error ?<br>
<br>
Now, regarding the processes, yes, it looks like the TCP main is
the one with extra load - this process is responsible for managing
the TCP connection - it is not accepting, reading, writing
anything, but is detecting events on the TCP sockets and dispatch
them to the TCP worker processes.<br>
<br>
Do you have a test suite or so to help in generating the traffic
corresponding to 50K clients ?<br>
<br>
Regards,<br>
</tt>
<pre class="moz-signature" cols="72">Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
<a class="moz-txt-link-freetext" href="http://www.opensips-solutions.com">http://www.opensips-solutions.com</a></pre>
<br>
On 04/30/2013 10:35 PM, Gavin Murphy wrote:
<blockquote cite="mid:51801CFE.1040207@newpace.ca" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<div class="moz-cite-prefix">The tcp_persistent_flag isn't set as
that appears to be for the registrar module, which we aren't
using. We're passing REGISTERs through to our own registrar.<br>
<br>
Here is a snapshot of a test currently being run with 50K
concurrent TCP "clients" (doesn't show all of the opensips
processes). This level of traffic is not generating any
TCP-related errors in opensips.<br>
<br>
3411 rcsuser 20 0 6516m 3.1g 3.1g R 54 39.5 73:14.06
opensips<br>
3376 rcsuser 20 0 6516m 221m 219m S 11 2.8 14:07.50
opensips<br>
3375 rcsuser 20 0 6516m 221m 219m S 10 2.8 13:57.23
opensips<br>
3373 rcsuser 20 0 6516m 221m 219m S 9 2.8 14:10.93
opensips<br>
3374 rcsuser 20 0 6516m 221m 219m S 9 2.8 14:04.26
opensips<br>
3377 rcsuser 20 0 6516m 1608 200 S 0 0.0 0:01.44
opensips<br>
3379 rcsuser 20 0 6516m 48m 40m S 0 0.6 0:14.52
opensips<br>
3380 rcsuser 20 0 6516m 48m 40m S 0 0.6 0:14.65
opensips<br>
3381 rcsuser 20 0 6516m 48m 40m S 0 0.6 0:14.38
opensips<br>
3382 rcsuser 20 0 6516m 47m 39m S 0 0.6 0:14.56
opensips<br>
3385 rcsuser 20 0 6516m 48m 40m S 0 0.6 0:14.52
opensips<br>
3386 rcsuser 20 0 6516m 49m 41m S 0 0.6 0:14.67
opensips<br>
3390 rcsuser 20 0 6516m 49m 41m S 0 0.6 0:14.50
opensips<br>
3394 rcsuser 20 0 6516m 47m 39m S 0 0.6 0:14.42
opensips<br>
3395 rcsuser 20 0 6516m 47m 39m S 0 0.6 0:14.44
opensips<br>
3396 rcsuser 20 0 6516m 48m 40m S 0 0.6 0:14.72
opensips<br>
3401 rcsuser 20 0 6516m 50m 42m S 0 0.6 0:14.72
opensips<br>
3402 rcsuser 20 0 6516m 50m 42m S 0 0.6 0:14.75
opensips<br>
3403 rcsuser 20 0 6516m 48m 40m S 0 0.6 0:14.78
opensips<br>
3404 rcsuser 20 0 6516m 48m 40m S 0 0.6 0:14.60
opensips<br>
3408 rcsuser 20 0 6516m 50m 42m S 0 0.6 0:14.49
opensips<br>
3409 rcsuser 20 0 6516m 50m 42m S 0 0.6 0:14.75
opensips<br>
3410 rcsuser 20 0 6516m 50m 42m S 0 0.6 0:14.61
opensips<br>
<br>
And the results from the fifo command:<br>
<br>
Process:: ID=0 PID=3367 Type=attendant<br>
Process:: ID=1 PID=3368 Type=MI FIFO<br>
Process:: ID=2 PID=3369 Type=SIP receiver udp:127.0.0.1:9050<br>
Process:: ID=3 PID=3370 Type=SIP receiver udp:127.0.0.1:9050<br>
Process:: ID=4 PID=3371 Type=SIP receiver udp:127.0.0.1:9050<br>
Process:: ID=5 PID=3372 Type=SIP receiver udp:127.0.0.1:9050<br>
Process:: ID=6 PID=3373 Type=SIP receiver
udp:192.168.38.175:9050<br>
Process:: ID=7 PID=3374 Type=SIP receiver
udp:192.168.38.175:9050<br>
Process:: ID=8 PID=3375 Type=SIP receiver
udp:192.168.38.175:9050<br>
Process:: ID=9 PID=3376 Type=SIP receiver
udp:192.168.38.175:9050<br>
Process:: ID=10 PID=3377 Type=time_keeper<br>
Process:: ID=11 PID=3378 Type=timer<br>
Process:: ID=12 PID=3379 Type=TCP receiver<br>
Process:: ID=13 PID=3380 Type=TCP receiver<br>
Process:: ID=14 PID=3381 Type=TCP receiver<br>
Process:: ID=15 PID=3382 Type=TCP receiver<br>
Process:: ID=16 PID=3383 Type=TCP receiver<br>
Process:: ID=17 PID=3384 Type=TCP receiver<br>
Process:: ID=18 PID=3385 Type=TCP receiver<br>
Process:: ID=19 PID=3386 Type=TCP receiver<br>
Process:: ID=20 PID=3387 Type=TCP receiver<br>
Process:: ID=21 PID=3388 Type=TCP receiver<br>
Process:: ID=22 PID=3389 Type=TCP receiver<br>
Process:: ID=23 PID=3390 Type=TCP receiver<br>
Process:: ID=24 PID=3391 Type=TCP receiver<br>
Process:: ID=25 PID=3392 Type=TCP receiver<br>
Process:: ID=26 PID=3393 Type=TCP receiver<br>
Process:: ID=27 PID=3394 Type=TCP receiver<br>
Process:: ID=28 PID=3395 Type=TCP receiver<br>
Process:: ID=29 PID=3396 Type=TCP receiver<br>
Process:: ID=30 PID=3397 Type=TCP receiver<br>
Process:: ID=31 PID=3398 Type=TCP receiver<br>
Process:: ID=32 PID=3399 Type=TCP receiver<br>
Process:: ID=33 PID=3400 Type=TCP receiver<br>
Process:: ID=34 PID=3401 Type=TCP receiver<br>
Process:: ID=35 PID=3402 Type=TCP receiver<br>
Process:: ID=36 PID=3403 Type=TCP receiver<br>
Process:: ID=37 PID=3404 Type=TCP receiver<br>
Process:: ID=38 PID=3405 Type=TCP receiver<br>
Process:: ID=39 PID=3406 Type=TCP receiver<br>
Process:: ID=40 PID=3407 Type=TCP receiver<br>
Process:: ID=41 PID=3408 Type=TCP receiver<br>
Process:: ID=42 PID=3409 Type=TCP receiver<br>
Process:: ID=43 PID=3410 Type=TCP receiver<br>
Process:: ID=44 PID=3411 Type=TCP main<br>
<br>
So is it a correct assumption that the "TCP main" type is
responsible for accepting the initial connection and handing it
off to one of the "TCP receiver" types? Is that why it uses the
most CPU and memory resources? If so, is it just memory and CPU
that are limiting factors in terms of how many connections we
can get established concurrently?<br>
<br>
Gavin<br>
<br>
On 29/04/2013 9:48 AM, Bogdan-Andrei Iancu wrote:<br>
</div>
<blockquote cite="mid:517E6C25.8040508@opensips.org" type="cite">Hello
Gavin, <br>
<br>
The errors you get indicates that OpenSIPS is trying to open a
TCP connection to a destination which does not accept it. Based
on your description, I would say there is not need for OpenSIPS
to open TCP connections - they will be open by the clients when
registering. <br>
<br>
Ruling out the scenario of a misrouting , the only explanation
will be that the TCP connections expires (timeout without
traffic) long before the corresponding registration - so you end
up with a registration (in usrloc) which has no TCP conn towards
the actual device. Are you using the tcp_persistent_flag ? <br>
<a moz-do-not-send="true"
class="moz-txt-link-freetext"
href="http://www.opensips.org/html/docs/modules/1.9.x/registrar.html#id250105">http://www.opensips.org/html/docs/modules/1.9.x/registrar.html#id250105</a>
<br>
<br>
About the load on the processes, you can do "opensipsctl fifo
ps" to get the listing of the processes and their description -
you could correlate with the TOP info to see what's the process
burning CPU <br>
<br>
Regards, <br>
<br>
Bogdan-Andrei Iancu <br>
OpenSIPS Founder and Developer <br>
<a moz-do-not-send="true" class="moz-txt-link-freetext"
href="http://www.opensips-solutions.com">http://www.opensips-solutions.com</a>
<br>
<br>
<br>
On 04/26/2013 05:44 PM, Gavin Murphy wrote: <br>
<blockquote type="cite">We're trying to load up opensips with as
many TCP connections as we possibly can. So far we've got it
to about 82K, but failures start occurring at that point. We
have 8GBs of RAM allocated to the server as a whole (is that
enough? we don't appear to be exhausting it). We've set the
following parameters for OpenSIPS: <br>
<br>
tcp_children=32 <br>
tcp_max_connections=250000 <br>
tcp_connection_lifetime=610 <br>
tcp_keepalive=1 <br>
tcp_keepcount=3 <br>
tcp_keepidle=300 <br>
tcp_keepinterval=300 <br>
<br>
We have also set ulimit -n 1024000 and ulimit -s 768. <br>
<br>
The scenario is that our load driver establishes "client"
connections to OpenSIPS via TCP, and sends REGISTERs over
those connections. While the REGISTERs come in over TCP, they
are sent out to our registrar via UDP. Around the point where
we get to the 40K connection mark we start seeing the
following in the logs: <br>
<br>
Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]:
ERROR:core:tcp_blocking_connect: poll error: flags 1c <br>
Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]:
ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR
(111) Connection refused <br>
Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]:
ERROR:core:tcpconn_connect: tcp_blocking_connect failed <br>
Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]:
ERROR:core:tcp_send: connect failed <br>
Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]:
ERROR:tm:msg_send: tcp_send failed <br>
<br>
It almost appears as though opensips is trying to establish a
connection somewhere and is being refused. Except that it
shouldn't be trying to establish any, unless it's for internal
purposes. Unfortunately the logs aren't clear on that point
(in terms of what connection is trying to be established). <br>
<br>
One other thing that appears puzzling: it seems that one of
the opensips processes is bearing most of the brunt. I am
assuming that it's the instance that is actually accepting the
connections, and that the subsequent (low) amount of traffic
is then handed off to the children. But if that's the case, it
also means that it's handling a lot of the workload, and I was
hoping that it would be more evenly distributed. <br>
<br>
Here is a snapshot of the opensips processes in top: <br>
<br>
27577 rcsuser 20 0 6516m 2.5g 2.5g R 76 31.9 8:15.26
opensips <br>
27542 rcsuser 20 0 6516m 181m 180m S 16 2.3 0:54.60
opensips <br>
27541 rcsuser 20 0 6516m 182m 180m S 14 2.3 0:54.47
opensips <br>
27539 rcsuser 20 0 6516m 182m 180m S 13 2.3 0:53.75
opensips <br>
27540 rcsuser 20 0 6516m 182m 180m S 11 2.3 0:53.64
opensips <br>
27545 rcsuser 20 0 6516m 37m 29m S 0 0.5 0:01.03
opensips <br>
27551 rcsuser 20 0 6516m 35m 27m S 0 0.4 0:00.94
opensips <br>
27553 rcsuser 20 0 6516m 36m 28m S 0 0.5 0:00.95
opensips <br>
27555 rcsuser 20 0 6516m 37m 29m S 0 0.5 0:00.99
opensips <br>
27557 rcsuser 20 0 6516m 35m 27m S 0 0.4 0:00.92
opensips <br>
27558 rcsuser 20 0 6516m 35m 27m S 0 0.4 0:00.90
opensips <br>
27560 rcsuser 20 0 6516m 36m 28m S 0 0.5 0:00.98
opensips <br>
27563 rcsuser 20 0 6516m 36m 28m S 0 0.5 0:00.94
opensips <br>
27564 rcsuser 20 0 6516m 36m 27m S 0 0.5 0:00.93
opensips <br>
27565 rcsuser 20 0 6516m 36m 28m S 0 0.5 0:00.93
opensips <br>
27567 rcsuser 20 0 6516m 36m 28m S 0 0.5 0:00.95
opensips <br>
27575 rcsuser 20 0 6516m 36m 28m S 0 0.5 0:00.95
opensips <br>
27576 rcsuser 20 0 6516m 36m 28m S 0 0.5 0:00.98
opensips <br>
<br>
So basically what I'm looking for is some help on getting the
operating system and opensips tuned to the point where we can
get substantially more than 80K connections. Or am I asking
for too much? <br>
<br>
Thanks, <br>
<br>
Gavin <br>
<br>
<br>
_______________________________________________ <br>
Users mailing list <br>
<a moz-do-not-send="true" class="moz-txt-link-abbreviated"
href="mailto:Users@lists.opensips.org">Users@lists.opensips.org</a>
<br>
<a moz-do-not-send="true" class="moz-txt-link-freetext"
href="http://lists.opensips.org/cgi-bin/mailman/listinfo/users">http://lists.opensips.org/cgi-bin/mailman/listinfo/users</a>
<br>
<br>
</blockquote>
</blockquote>
<br>
<br>
<div class="moz-signature">-- <br>
<table id="signature" style="font-size: 85%; color: rgb(61, 35,
22);" border="0" cellpadding="1" cellspacing="0">
<tbody>
<tr>
<td rowspan="2" valign="middle" align="center"><img
moz-do-not-send="false"
src="cid:part1.01060200.00010405@opensips.org"
alt="NewPace Logo" height="50" width="50"></td>
<td rowspan="6" width="6px"><br>
</td>
<td><br>
</td>
<td rowspan="6" width="6px"><br>
</td>
<td><font style="font-weight: bold; font-size: 110%;">Gavin
Murphy</font></td>
</tr>
<tr>
<td rowspan="6" style="font-weight: bold; font-size:
100%;" width="1px" bgcolor="#a8cf38"><br>
</td>
<td>Vice President & CTO, NewPace</td>
</tr>
<tr>
<td align="right">phone</td>
<td>+1 (902) 406–8375 x1002</td>
</tr>
<tr>
<td align="right">email</td>
<td><a moz-do-not-send="true"
href="mailto:gavin.murphy@newpace.com"
style="text-decoration: none; color: rgb(61, 35, 22);">gavin.murphy@newpace.com</a></td>
</tr>
<tr>
<td align="right"><a moz-do-not-send="true"
href="aim:GoIm?screenname=gavin.murphy@newpace.com"
style="text-decoration: none; color: rgb(61, 35, 22);">aim</a></td>
<td><a moz-do-not-send="true"
href="aim:GoIm?screenname=gavin.murphy@newpace.com"
style="text-decoration: none; color: rgb(61, 35, 22);">gavin.murphy</a>@newpace.com</td>
</tr>
</tbody>
</table>
</div>
</blockquote>
</body>
</html>