[OpenSIPS-Users] Tuning for maximum number of TCP connections

Bogdan-Andrei Iancu bogdan at opensips.org
Thu May 9 12:41:35 CEST 2013


Hi Gavin,

I see, no registration....As an exercise, increase the
tcp_connection_lifetime to 7200 (2 h), just to rule out the possibility
of connections timing out.

Are you saying that running a constant load of 50K TCP conns (for long
time), does not result in any TCP error ?

Now, regarding the processes, yes, it looks like the TCP main is the one
with extra load - this process is responsible for managing the TCP
connection - it is not accepting, reading, writing anything, but is
detecting events on the TCP sockets and dispatch them to the TCP worker
processes.

Do you have a test suite or so to help in generating the traffic
corresponding to 50K clients ?

Regards,

Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
http://www.opensips-solutions.com


On 04/30/2013 10:35 PM, Gavin Murphy wrote:
> The tcp_persistent_flag isn't set as that appears to be for the
> registrar module, which we aren't using. We're passing REGISTERs
> through to our own registrar.
>
> Here is a snapshot of a test currently being run with 50K concurrent
> TCP "clients" (doesn't show all of the opensips processes). This level
> of traffic is not generating any TCP-related errors in opensips.
>
>  3411 rcsuser   20   0 6516m 3.1g 3.1g R   54 39.5  73:14.06 opensips
>  3376 rcsuser   20   0 6516m 221m 219m S   11  2.8  14:07.50 opensips
>  3375 rcsuser   20   0 6516m 221m 219m S   10  2.8  13:57.23 opensips
>  3373 rcsuser   20   0 6516m 221m 219m S    9  2.8  14:10.93 opensips
>  3374 rcsuser   20   0 6516m 221m 219m S    9  2.8  14:04.26 opensips
>  3377 rcsuser   20   0 6516m 1608  200 S    0  0.0   0:01.44 opensips
>  3379 rcsuser   20   0 6516m  48m  40m S    0  0.6   0:14.52 opensips
>  3380 rcsuser   20   0 6516m  48m  40m S    0  0.6   0:14.65 opensips
>  3381 rcsuser   20   0 6516m  48m  40m S    0  0.6   0:14.38 opensips
>  3382 rcsuser   20   0 6516m  47m  39m S    0  0.6   0:14.56 opensips
>  3385 rcsuser   20   0 6516m  48m  40m S    0  0.6   0:14.52 opensips
>  3386 rcsuser   20   0 6516m  49m  41m S    0  0.6   0:14.67 opensips
>  3390 rcsuser   20   0 6516m  49m  41m S    0  0.6   0:14.50 opensips
>  3394 rcsuser   20   0 6516m  47m  39m S    0  0.6   0:14.42 opensips
>  3395 rcsuser   20   0 6516m  47m  39m S    0  0.6   0:14.44 opensips
>  3396 rcsuser   20   0 6516m  48m  40m S    0  0.6   0:14.72 opensips
>  3401 rcsuser   20   0 6516m  50m  42m S    0  0.6   0:14.72 opensips
>  3402 rcsuser   20   0 6516m  50m  42m S    0  0.6   0:14.75 opensips
>  3403 rcsuser   20   0 6516m  48m  40m S    0  0.6   0:14.78 opensips
>  3404 rcsuser   20   0 6516m  48m  40m S    0  0.6   0:14.60 opensips
>  3408 rcsuser   20   0 6516m  50m  42m S    0  0.6   0:14.49 opensips
>  3409 rcsuser   20   0 6516m  50m  42m S    0  0.6   0:14.75 opensips
>  3410 rcsuser   20   0 6516m  50m  42m S    0  0.6   0:14.61 opensips
>
> And the results from the fifo command:
>
> Process::  ID=0 PID=3367 Type=attendant
> Process::  ID=1 PID=3368 Type=MI FIFO
> Process::  ID=2 PID=3369 Type=SIP receiver udp:127.0.0.1:9050
> Process::  ID=3 PID=3370 Type=SIP receiver udp:127.0.0.1:9050
> Process::  ID=4 PID=3371 Type=SIP receiver udp:127.0.0.1:9050
> Process::  ID=5 PID=3372 Type=SIP receiver udp:127.0.0.1:9050
> Process::  ID=6 PID=3373 Type=SIP receiver udp:192.168.38.175:9050
> Process::  ID=7 PID=3374 Type=SIP receiver udp:192.168.38.175:9050
> Process::  ID=8 PID=3375 Type=SIP receiver udp:192.168.38.175:9050
> Process::  ID=9 PID=3376 Type=SIP receiver udp:192.168.38.175:9050
> Process::  ID=10 PID=3377 Type=time_keeper
> Process::  ID=11 PID=3378 Type=timer
> Process::  ID=12 PID=3379 Type=TCP receiver
> Process::  ID=13 PID=3380 Type=TCP receiver
> Process::  ID=14 PID=3381 Type=TCP receiver
> Process::  ID=15 PID=3382 Type=TCP receiver
> Process::  ID=16 PID=3383 Type=TCP receiver
> Process::  ID=17 PID=3384 Type=TCP receiver
> Process::  ID=18 PID=3385 Type=TCP receiver
> Process::  ID=19 PID=3386 Type=TCP receiver
> Process::  ID=20 PID=3387 Type=TCP receiver
> Process::  ID=21 PID=3388 Type=TCP receiver
> Process::  ID=22 PID=3389 Type=TCP receiver
> Process::  ID=23 PID=3390 Type=TCP receiver
> Process::  ID=24 PID=3391 Type=TCP receiver
> Process::  ID=25 PID=3392 Type=TCP receiver
> Process::  ID=26 PID=3393 Type=TCP receiver
> Process::  ID=27 PID=3394 Type=TCP receiver
> Process::  ID=28 PID=3395 Type=TCP receiver
> Process::  ID=29 PID=3396 Type=TCP receiver
> Process::  ID=30 PID=3397 Type=TCP receiver
> Process::  ID=31 PID=3398 Type=TCP receiver
> Process::  ID=32 PID=3399 Type=TCP receiver
> Process::  ID=33 PID=3400 Type=TCP receiver
> Process::  ID=34 PID=3401 Type=TCP receiver
> Process::  ID=35 PID=3402 Type=TCP receiver
> Process::  ID=36 PID=3403 Type=TCP receiver
> Process::  ID=37 PID=3404 Type=TCP receiver
> Process::  ID=38 PID=3405 Type=TCP receiver
> Process::  ID=39 PID=3406 Type=TCP receiver
> Process::  ID=40 PID=3407 Type=TCP receiver
> Process::  ID=41 PID=3408 Type=TCP receiver
> Process::  ID=42 PID=3409 Type=TCP receiver
> Process::  ID=43 PID=3410 Type=TCP receiver
> Process::  ID=44 PID=3411 Type=TCP main
>
> So is it a correct assumption that the "TCP main" type is responsible
> for accepting the initial connection and handing it off to one of the
> "TCP receiver" types? Is that why it uses the most CPU and memory
> resources? If so, is it just memory and CPU that are limiting factors
> in terms of how many connections we can get established concurrently?
>
> Gavin
>
> On 29/04/2013 9:48 AM, Bogdan-Andrei Iancu wrote:
>> Hello Gavin,
>>
>> The errors you get indicates that OpenSIPS is trying to open a TCP
>> connection to a destination which does not accept it. Based on your
>> description, I would say there is not need for OpenSIPS to open TCP
>> connections - they will be open by the clients when registering.
>>
>> Ruling out the scenario of a misrouting , the only explanation will
>> be that the TCP connections expires (timeout without traffic) long
>> before the corresponding registration - so you end up with a
>> registration (in usrloc) which has no TCP conn towards the actual
>> device. Are you using the tcp_persistent_flag ?
>>             
>> http://www.opensips.org/html/docs/modules/1.9.x/registrar.html#id250105
>>
>> About the load on the processes, you can do "opensipsctl fifo ps" to
>> get the listing of the processes and their description - you could
>> correlate with the TOP info to see what's the process burning CPU
>>
>> Regards,
>>
>> Bogdan-Andrei Iancu
>> OpenSIPS Founder and Developer
>> http://www.opensips-solutions.com
>>
>>
>> On 04/26/2013 05:44 PM, Gavin Murphy wrote:
>>> We're trying to load up opensips with as many TCP connections as we
>>> possibly can. So far we've got it to about 82K, but failures start
>>> occurring at that point. We have 8GBs of RAM allocated to the server
>>> as a whole (is that enough? we don't appear to be exhausting it).
>>> We've set the following parameters for OpenSIPS:
>>>
>>> tcp_children=32
>>> tcp_max_connections=250000
>>> tcp_connection_lifetime=610
>>> tcp_keepalive=1
>>> tcp_keepcount=3
>>> tcp_keepidle=300
>>> tcp_keepinterval=300
>>>
>>> We have also set ulimit -n 1024000 and ulimit -s 768.
>>>
>>> The scenario is that our load driver establishes "client"
>>> connections to OpenSIPS via TCP, and sends REGISTERs over those
>>> connections. While the REGISTERs come in over TCP, they are sent out
>>> to our registrar via UDP. Around the point where we get to the 40K
>>> connection mark we start seeing the following in the logs:
>>>
>>> Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]:
>>> ERROR:core:tcp_blocking_connect: poll error: flags 1c
>>> Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]:
>>> ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111)
>>> Connection refused
>>> Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]:
>>> ERROR:core:tcpconn_connect: tcp_blocking_connect failed
>>> Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]:
>>> ERROR:core:tcp_send: connect failed
>>> Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]:
>>> ERROR:tm:msg_send: tcp_send failed
>>>
>>> It almost appears as though opensips is trying to establish a
>>> connection somewhere and is being refused. Except that it shouldn't
>>> be trying to establish any, unless it's for internal purposes.
>>> Unfortunately the logs aren't clear on that point (in terms of what
>>> connection is trying to be established).
>>>
>>> One other thing that appears puzzling: it seems that one of the
>>> opensips processes is bearing most of the brunt. I am assuming that
>>> it's the instance that is actually accepting the connections, and
>>> that the subsequent (low) amount of traffic is then handed off to
>>> the children. But if that's the case, it also means that it's
>>> handling a lot of the workload, and I was hoping that it would be
>>> more evenly distributed.
>>>
>>> Here is a snapshot of the opensips processes in top:
>>>
>>> 27577 rcsuser   20   0 6516m 2.5g 2.5g R   76 31.9   8:15.26 opensips
>>> 27542 rcsuser   20   0 6516m 181m 180m S   16  2.3   0:54.60 opensips
>>> 27541 rcsuser   20   0 6516m 182m 180m S   14  2.3   0:54.47 opensips
>>> 27539 rcsuser   20   0 6516m 182m 180m S   13  2.3   0:53.75 opensips
>>> 27540 rcsuser   20   0 6516m 182m 180m S   11  2.3   0:53.64 opensips
>>> 27545 rcsuser   20   0 6516m  37m  29m S    0  0.5   0:01.03 opensips
>>> 27551 rcsuser   20   0 6516m  35m  27m S    0  0.4   0:00.94 opensips
>>> 27553 rcsuser   20   0 6516m  36m  28m S    0  0.5   0:00.95 opensips
>>> 27555 rcsuser   20   0 6516m  37m  29m S    0  0.5   0:00.99 opensips
>>> 27557 rcsuser   20   0 6516m  35m  27m S    0  0.4   0:00.92 opensips
>>> 27558 rcsuser   20   0 6516m  35m  27m S    0  0.4   0:00.90 opensips
>>> 27560 rcsuser   20   0 6516m  36m  28m S    0  0.5   0:00.98 opensips
>>> 27563 rcsuser   20   0 6516m  36m  28m S    0  0.5   0:00.94 opensips
>>> 27564 rcsuser   20   0 6516m  36m  27m S    0  0.5   0:00.93 opensips
>>> 27565 rcsuser   20   0 6516m  36m  28m S    0  0.5   0:00.93 opensips
>>> 27567 rcsuser   20   0 6516m  36m  28m S    0  0.5   0:00.95 opensips
>>> 27575 rcsuser   20   0 6516m  36m  28m S    0  0.5   0:00.95 opensips
>>> 27576 rcsuser   20   0 6516m  36m  28m S    0  0.5   0:00.98 opensips
>>>
>>> So basically what I'm looking for is some help on getting the
>>> operating system and opensips tuned to the point where we can get
>>> substantially more than 80K connections. Or am I asking for too much?
>>>
>>> Thanks,
>>>
>>> Gavin
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opensips.org
>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>
>
>
> -- 
> NewPace Logo 	
> 	
> 	
> 	Gavin Murphy
>
> 	Vice President & CTO, NewPace
> phone 	+1 (902) 406--8375  x1002
> email 	gavin.murphy at newpace.com <mailto:gavin.murphy at newpace.com>
> aim <aim:GoIm?screenname=gavin.murphy at newpace.com> 	gavin.murphy
> <aim:GoIm?screenname=gavin.murphy at newpace.com>@newpace.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20130509/79504b90/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 4620 bytes
Desc: not available
URL: <http://lists.opensips.org/pipermail/users/attachments/20130509/79504b90/attachment-0001.png>


More information about the Users mailing list