[OpenSIPS-Users] Tuning for maximum number of TCP connections
Bogdan-Andrei Iancu
bogdan at opensips.org
Thu May 9 12:41:35 CEST 2013
Hi Gavin,
I see, no registration....As an exercise, increase the
tcp_connection_lifetime to 7200 (2 h), just to rule out the possibility
of connections timing out.
Are you saying that running a constant load of 50K TCP conns (for long
time), does not result in any TCP error ?
Now, regarding the processes, yes, it looks like the TCP main is the one
with extra load - this process is responsible for managing the TCP
connection - it is not accepting, reading, writing anything, but is
detecting events on the TCP sockets and dispatch them to the TCP worker
processes.
Do you have a test suite or so to help in generating the traffic
corresponding to 50K clients ?
Regards,
Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
http://www.opensips-solutions.com
On 04/30/2013 10:35 PM, Gavin Murphy wrote:
> The tcp_persistent_flag isn't set as that appears to be for the
> registrar module, which we aren't using. We're passing REGISTERs
> through to our own registrar.
>
> Here is a snapshot of a test currently being run with 50K concurrent
> TCP "clients" (doesn't show all of the opensips processes). This level
> of traffic is not generating any TCP-related errors in opensips.
>
> 3411 rcsuser 20 0 6516m 3.1g 3.1g R 54 39.5 73:14.06 opensips
> 3376 rcsuser 20 0 6516m 221m 219m S 11 2.8 14:07.50 opensips
> 3375 rcsuser 20 0 6516m 221m 219m S 10 2.8 13:57.23 opensips
> 3373 rcsuser 20 0 6516m 221m 219m S 9 2.8 14:10.93 opensips
> 3374 rcsuser 20 0 6516m 221m 219m S 9 2.8 14:04.26 opensips
> 3377 rcsuser 20 0 6516m 1608 200 S 0 0.0 0:01.44 opensips
> 3379 rcsuser 20 0 6516m 48m 40m S 0 0.6 0:14.52 opensips
> 3380 rcsuser 20 0 6516m 48m 40m S 0 0.6 0:14.65 opensips
> 3381 rcsuser 20 0 6516m 48m 40m S 0 0.6 0:14.38 opensips
> 3382 rcsuser 20 0 6516m 47m 39m S 0 0.6 0:14.56 opensips
> 3385 rcsuser 20 0 6516m 48m 40m S 0 0.6 0:14.52 opensips
> 3386 rcsuser 20 0 6516m 49m 41m S 0 0.6 0:14.67 opensips
> 3390 rcsuser 20 0 6516m 49m 41m S 0 0.6 0:14.50 opensips
> 3394 rcsuser 20 0 6516m 47m 39m S 0 0.6 0:14.42 opensips
> 3395 rcsuser 20 0 6516m 47m 39m S 0 0.6 0:14.44 opensips
> 3396 rcsuser 20 0 6516m 48m 40m S 0 0.6 0:14.72 opensips
> 3401 rcsuser 20 0 6516m 50m 42m S 0 0.6 0:14.72 opensips
> 3402 rcsuser 20 0 6516m 50m 42m S 0 0.6 0:14.75 opensips
> 3403 rcsuser 20 0 6516m 48m 40m S 0 0.6 0:14.78 opensips
> 3404 rcsuser 20 0 6516m 48m 40m S 0 0.6 0:14.60 opensips
> 3408 rcsuser 20 0 6516m 50m 42m S 0 0.6 0:14.49 opensips
> 3409 rcsuser 20 0 6516m 50m 42m S 0 0.6 0:14.75 opensips
> 3410 rcsuser 20 0 6516m 50m 42m S 0 0.6 0:14.61 opensips
>
> And the results from the fifo command:
>
> Process:: ID=0 PID=3367 Type=attendant
> Process:: ID=1 PID=3368 Type=MI FIFO
> Process:: ID=2 PID=3369 Type=SIP receiver udp:127.0.0.1:9050
> Process:: ID=3 PID=3370 Type=SIP receiver udp:127.0.0.1:9050
> Process:: ID=4 PID=3371 Type=SIP receiver udp:127.0.0.1:9050
> Process:: ID=5 PID=3372 Type=SIP receiver udp:127.0.0.1:9050
> Process:: ID=6 PID=3373 Type=SIP receiver udp:192.168.38.175:9050
> Process:: ID=7 PID=3374 Type=SIP receiver udp:192.168.38.175:9050
> Process:: ID=8 PID=3375 Type=SIP receiver udp:192.168.38.175:9050
> Process:: ID=9 PID=3376 Type=SIP receiver udp:192.168.38.175:9050
> Process:: ID=10 PID=3377 Type=time_keeper
> Process:: ID=11 PID=3378 Type=timer
> Process:: ID=12 PID=3379 Type=TCP receiver
> Process:: ID=13 PID=3380 Type=TCP receiver
> Process:: ID=14 PID=3381 Type=TCP receiver
> Process:: ID=15 PID=3382 Type=TCP receiver
> Process:: ID=16 PID=3383 Type=TCP receiver
> Process:: ID=17 PID=3384 Type=TCP receiver
> Process:: ID=18 PID=3385 Type=TCP receiver
> Process:: ID=19 PID=3386 Type=TCP receiver
> Process:: ID=20 PID=3387 Type=TCP receiver
> Process:: ID=21 PID=3388 Type=TCP receiver
> Process:: ID=22 PID=3389 Type=TCP receiver
> Process:: ID=23 PID=3390 Type=TCP receiver
> Process:: ID=24 PID=3391 Type=TCP receiver
> Process:: ID=25 PID=3392 Type=TCP receiver
> Process:: ID=26 PID=3393 Type=TCP receiver
> Process:: ID=27 PID=3394 Type=TCP receiver
> Process:: ID=28 PID=3395 Type=TCP receiver
> Process:: ID=29 PID=3396 Type=TCP receiver
> Process:: ID=30 PID=3397 Type=TCP receiver
> Process:: ID=31 PID=3398 Type=TCP receiver
> Process:: ID=32 PID=3399 Type=TCP receiver
> Process:: ID=33 PID=3400 Type=TCP receiver
> Process:: ID=34 PID=3401 Type=TCP receiver
> Process:: ID=35 PID=3402 Type=TCP receiver
> Process:: ID=36 PID=3403 Type=TCP receiver
> Process:: ID=37 PID=3404 Type=TCP receiver
> Process:: ID=38 PID=3405 Type=TCP receiver
> Process:: ID=39 PID=3406 Type=TCP receiver
> Process:: ID=40 PID=3407 Type=TCP receiver
> Process:: ID=41 PID=3408 Type=TCP receiver
> Process:: ID=42 PID=3409 Type=TCP receiver
> Process:: ID=43 PID=3410 Type=TCP receiver
> Process:: ID=44 PID=3411 Type=TCP main
>
> So is it a correct assumption that the "TCP main" type is responsible
> for accepting the initial connection and handing it off to one of the
> "TCP receiver" types? Is that why it uses the most CPU and memory
> resources? If so, is it just memory and CPU that are limiting factors
> in terms of how many connections we can get established concurrently?
>
> Gavin
>
> On 29/04/2013 9:48 AM, Bogdan-Andrei Iancu wrote:
>> Hello Gavin,
>>
>> The errors you get indicates that OpenSIPS is trying to open a TCP
>> connection to a destination which does not accept it. Based on your
>> description, I would say there is not need for OpenSIPS to open TCP
>> connections - they will be open by the clients when registering.
>>
>> Ruling out the scenario of a misrouting , the only explanation will
>> be that the TCP connections expires (timeout without traffic) long
>> before the corresponding registration - so you end up with a
>> registration (in usrloc) which has no TCP conn towards the actual
>> device. Are you using the tcp_persistent_flag ?
>>
>> http://www.opensips.org/html/docs/modules/1.9.x/registrar.html#id250105
>>
>> About the load on the processes, you can do "opensipsctl fifo ps" to
>> get the listing of the processes and their description - you could
>> correlate with the TOP info to see what's the process burning CPU
>>
>> Regards,
>>
>> Bogdan-Andrei Iancu
>> OpenSIPS Founder and Developer
>> http://www.opensips-solutions.com
>>
>>
>> On 04/26/2013 05:44 PM, Gavin Murphy wrote:
>>> We're trying to load up opensips with as many TCP connections as we
>>> possibly can. So far we've got it to about 82K, but failures start
>>> occurring at that point. We have 8GBs of RAM allocated to the server
>>> as a whole (is that enough? we don't appear to be exhausting it).
>>> We've set the following parameters for OpenSIPS:
>>>
>>> tcp_children=32
>>> tcp_max_connections=250000
>>> tcp_connection_lifetime=610
>>> tcp_keepalive=1
>>> tcp_keepcount=3
>>> tcp_keepidle=300
>>> tcp_keepinterval=300
>>>
>>> We have also set ulimit -n 1024000 and ulimit -s 768.
>>>
>>> The scenario is that our load driver establishes "client"
>>> connections to OpenSIPS via TCP, and sends REGISTERs over those
>>> connections. While the REGISTERs come in over TCP, they are sent out
>>> to our registrar via UDP. Around the point where we get to the 40K
>>> connection mark we start seeing the following in the logs:
>>>
>>> Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]:
>>> ERROR:core:tcp_blocking_connect: poll error: flags 1c
>>> Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]:
>>> ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111)
>>> Connection refused
>>> Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]:
>>> ERROR:core:tcpconn_connect: tcp_blocking_connect failed
>>> Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]:
>>> ERROR:core:tcp_send: connect failed
>>> Apr 25 12:28:19 blackmamba rcsuser-opensips[27540]:
>>> ERROR:tm:msg_send: tcp_send failed
>>>
>>> It almost appears as though opensips is trying to establish a
>>> connection somewhere and is being refused. Except that it shouldn't
>>> be trying to establish any, unless it's for internal purposes.
>>> Unfortunately the logs aren't clear on that point (in terms of what
>>> connection is trying to be established).
>>>
>>> One other thing that appears puzzling: it seems that one of the
>>> opensips processes is bearing most of the brunt. I am assuming that
>>> it's the instance that is actually accepting the connections, and
>>> that the subsequent (low) amount of traffic is then handed off to
>>> the children. But if that's the case, it also means that it's
>>> handling a lot of the workload, and I was hoping that it would be
>>> more evenly distributed.
>>>
>>> Here is a snapshot of the opensips processes in top:
>>>
>>> 27577 rcsuser 20 0 6516m 2.5g 2.5g R 76 31.9 8:15.26 opensips
>>> 27542 rcsuser 20 0 6516m 181m 180m S 16 2.3 0:54.60 opensips
>>> 27541 rcsuser 20 0 6516m 182m 180m S 14 2.3 0:54.47 opensips
>>> 27539 rcsuser 20 0 6516m 182m 180m S 13 2.3 0:53.75 opensips
>>> 27540 rcsuser 20 0 6516m 182m 180m S 11 2.3 0:53.64 opensips
>>> 27545 rcsuser 20 0 6516m 37m 29m S 0 0.5 0:01.03 opensips
>>> 27551 rcsuser 20 0 6516m 35m 27m S 0 0.4 0:00.94 opensips
>>> 27553 rcsuser 20 0 6516m 36m 28m S 0 0.5 0:00.95 opensips
>>> 27555 rcsuser 20 0 6516m 37m 29m S 0 0.5 0:00.99 opensips
>>> 27557 rcsuser 20 0 6516m 35m 27m S 0 0.4 0:00.92 opensips
>>> 27558 rcsuser 20 0 6516m 35m 27m S 0 0.4 0:00.90 opensips
>>> 27560 rcsuser 20 0 6516m 36m 28m S 0 0.5 0:00.98 opensips
>>> 27563 rcsuser 20 0 6516m 36m 28m S 0 0.5 0:00.94 opensips
>>> 27564 rcsuser 20 0 6516m 36m 27m S 0 0.5 0:00.93 opensips
>>> 27565 rcsuser 20 0 6516m 36m 28m S 0 0.5 0:00.93 opensips
>>> 27567 rcsuser 20 0 6516m 36m 28m S 0 0.5 0:00.95 opensips
>>> 27575 rcsuser 20 0 6516m 36m 28m S 0 0.5 0:00.95 opensips
>>> 27576 rcsuser 20 0 6516m 36m 28m S 0 0.5 0:00.98 opensips
>>>
>>> So basically what I'm looking for is some help on getting the
>>> operating system and opensips tuned to the point where we can get
>>> substantially more than 80K connections. Or am I asking for too much?
>>>
>>> Thanks,
>>>
>>> Gavin
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opensips.org
>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>
>
>
> --
> NewPace Logo
>
>
> Gavin Murphy
>
> Vice President & CTO, NewPace
> phone +1 (902) 406--8375 x1002
> email gavin.murphy at newpace.com <mailto:gavin.murphy at newpace.com>
> aim <aim:GoIm?screenname=gavin.murphy at newpace.com> gavin.murphy
> <aim:GoIm?screenname=gavin.murphy at newpace.com>@newpace.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20130509/79504b90/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 4620 bytes
Desc: not available
URL: <http://lists.opensips.org/pipermail/users/attachments/20130509/79504b90/attachment-0001.png>
More information about the Users
mailing list