[OpenSIPS-Users] Opensips TCP children deadlock

Bogdan-Andrei Iancu bogdan at opensips.org
Thu Aug 2 09:24:21 EDT 2018


Hi all,

For the sake of completion, here is the commit fixing the issue:
https://github.com/OpenSIPS/opensips/commit/058cc22cb55dce9b890308b9f83a42a88691f2c8

Thank you Yuval for the report and for investigating this!

Best regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   http://www.opensips-solutions.com
OpenSIPS Bootcamp 2018
   http://opensips.org/training/OpenSIPS_Bootcamp_2018/

On 07/12/2018 04:07 PM, Yuval Dinari via Users wrote:
> Hi,
> I have a state in which opensips gets into an unrecoverable bad state, 
> in which some of the tcp children process are stuck waiting to acquire 
> a lock which they never get.
> The issue occurs in the following load test scenario:
>
>  1. About 25K clients register in TCP (but also happens with less)
>  2. All the TCP connections become unresponsive (by blocking outgoing
>     traffic on the test clients machine)
>  3. INVITEs are sent for each of those clients, putting their
>     connection in retransmit mode
>  4. After a few minutes opensips gets into a bad state - some tcp
>     children run at 90-100% cpu, no traffic is being sent from the
>     machine (including OPTIONS pings)
>  5. After all the tcp connections die due to timeouts, opensips does
>     not recover, the mentioned symptoms stay
>  6. After all the registered users are removed from internal table
>     there's still no change
>
> When attaching debugger to the problematic processes (with high cpu 
> usage) we see that they're all stuck trying to get a lock which they 
> never seem to get. Stack traces:
>
> #0  0x00007fd6b72d1bb7 in sched_yield () at 
> ../sysdeps/unix/syscall-template.S:81
> #1  0x0000000000549e65 in get_lock (lock=<optimized out>) at 
> net/proto_tcp/../../net/../fastlock.h:221
> #2  _tcp_write_on_socket (len=<optimized out>, buf=<optimized out>, 
> fd=<optimized out>, c=<optimized out>) at net/proto_tcp/proto_tcp.c:724
> #3  proto_tcp_send (send_sock=0x7ffd8e12c140, buf=0x0, len=399, 
> to=0x7fd5c7ccdcc0, id=1) at net/proto_tcp/proto_tcp.c:922
> #4  0x00007fd5a5cb7b30 in msg_send (msg=<optimized out>, 
> len=<optimized out>, buf=<optimized out>, id=<optimized out>, 
> to=<optimized out>, proto=<optimized out>,
>     send_sock=0x7fd6a7208168) at ../../forward.h:123
> #5  send_pr_buffer (rb=0x7fd5c7ccdca0, buf=0x7fd6a76b4a50, len=0, 
> ctx=0xffffffffffffffff) at t_funcs.c:66
>
> And:
>
> #0  0x00007fd6b72d1bb7 in sched_yield () at 
> ../sysdeps/unix/syscall-template.S:81
> #1  0x00000000005349b8 in get_lock (lock=<optimized out>) at 
> net/../fastlock.h:221
> #2  handle_io (event_type=<optimized out>, idx=<optimized out>, 
> fm=<optimized out>) at net/net_tcp_proc.c:210
> #3  io_wait_loop_epoll (repeat=287, t=<optimized out>, h=<optimized 
> out>) at net/../io_wait_loop.h:280
>
> This traces look the same every time we attach.
> The machine opensips runs on has 4 cpus.
> Thanks
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20180802/173faea8/attachment-0001.html>


More information about the Users mailing list