[OpenSIPS-Devel] [opensips] TCP listeners occasionally get stuck and consume 100% CPU (#410)

Brian Tarricone notifications at github.com
Thu Feb 5 11:05:46 CET 2015


I've had problems in production where, after a while, some of the TCP listener opensips processes will get stuck and start consuming 100% CPU.  The only way to resolve it is to restart opensips.  This slowly starts to happen to each TCP listener until all are stuck and we stop serving requests.  Even while there are still a few functioning listeners, service is degraded and some clients to opensips times out.  I'm currently using the 1.11 branch at commit 17c5990ee35691451008380f13b2ffcbc59ccf79.  I also observed this behavior on 1.8, and was hoping upgrading to 1.11 would solve the issue.  Here's a typical stack trace while the process is stuck:

```
#0  0x00007f7d3596ee87 in sched_yield () from /lib64/libc.so.6
#1  0x00000000004cfa4b in get_lock (lock=0x7f7cb1e48390) at fastlock.h:182
#2  tcp_send (send_sock=0x7f7d334f5298, type=type at entry=3, 
    buf=buf at entry=0x7f7cba888c68 "CANCEL sip:[redacted]@[redacted];transport=TLS SIP/2.0\r\nVia: SIP/2.0/TLS [redacted]:5061;branch=z9hG4bK0d93.b6c7a621.0;i=dc65\r\nVia: SIP/2.0/TLS [redacted]:51370;r"..., len=len at entry=559, to=to at entry=0x7f7cb4d4ba50, id=id at entry=0)
    at tcp_main.c:1239
#3  0x00007f7d32419707 in msg_send (len=559, 
    buf=0x7f7cba888c68 "CANCEL sip:[redacted]@[redacted];transport=TLS SIP/2.0\r\nVia: SIP/2.0/TLS [redacted]:5061;branch=z9hG4bK0d93.b6c7a621.0;i=dc65\r\nVia: SIP/2.0/TLS [redacted]:51370;r"..., 
    id=0, to=0x7f7cb4d4ba50, proto=3, send_sock=<optimized out>)
    at ../../forward.h:128
#4  send_pr_buffer (rb=rb at entry=0x7f7cb4d4ba30, buf=0x7f7cba888c68, len=559)
    at t_funcs.c:69
#5  0x00007f7d3241eddf in t_forward_nonack (t=t at entry=0x7f7cb4d4b858, 
    p_msg=p_msg at entry=0x7f7d33cd8ed8, proxy=proxy at entry=0x0) at t_fwd.c:681
#6  0x00007f7d32419ccc in t_relay_to (p_msg=p_msg at entry=0x7f7d33cd8ed8, 
    proxy=proxy at entry=0x0, flags=flags at entry=0) at t_funcs.c:241
#7  0x00007f7d3243125d in w_t_relay (p_msg=0x7f7d33cd8ed8, 
    proxy=<optimized out>, flags=0x0) at tm.c:1206
#8  0x000000000041b071 in do_action (a=a at entry=0x7f7d3350b1b8, 
    msg=msg at entry=0x7f7d33cd8ed8) at action.c:1838
#9  0x0000000000420a39 in run_action_list (a=<optimized out>, 
    msg=0x7f7d33cd8ed8) at action.c:171
#10 0x000000000041df58 in do_action (a=a at entry=0x7f7d3350c3e0, 
    msg=msg at entry=0x7f7d33cd8ed8) at action.c:1105
#11 0x0000000000420a39 in run_action_list (a=<optimized out>, 
    msg=0x7f7d33cd8ed8) at action.c:171
#12 0x000000000041df58 in do_action (a=a at entry=0x7f7d3350c978, 
    msg=msg at entry=0x7f7d33cd8ed8) at action.c:1105
#13 0x0000000000420a39 in run_action_list (a=<optimized out>, 
    msg=0x7f7d33cd8ed8) at action.c:171
#14 0x000000000041efbd in do_action (a=a at entry=0x7f7d3350ca58, 
    msg=msg at entry=0x7f7d33cd8ed8) at action.c:1111
#15 0x0000000000420a39 in run_action_list (a=<optimized out>, 
    msg=0x7f7d33cd8ed8) at action.c:171
#16 0x000000000041efbd in do_action (a=a at entry=0x7f7d3350cb38, 
    msg=msg at entry=0x7f7d33cd8ed8) at action.c:1111
#17 0x0000000000420a39 in run_action_list (a=<optimized out>, 
    msg=0x7f7d33cd8ed8) at action.c:171
#18 0x000000000041efbd in do_action (a=a at entry=0x7f7d3350cc18, 
    msg=msg at entry=0x7f7d33cd8ed8) at action.c:1111
#19 0x0000000000420a39 in run_action_list (a=<optimized out>, 
    msg=0x7f7d33cd8ed8) at action.c:171
#20 0x000000000041efbd in do_action (a=a at entry=0x7f7d3350ccf8, 
    msg=msg at entry=0x7f7d33cd8ed8) at action.c:1111
#21 0x0000000000420c24 in run_action_list (msg=<optimized out>, 
    a=<optimized out>) at action.c:171
#22 run_actions (msg=0x7f7d33cd8ed8, a=<optimized out>) at action.c:136
#23 run_top_route (a=<optimized out>, msg=msg at entry=0x7f7d33cd8ed8)
    at action.c:211
#24 0x0000000000489143 in receive_msg (
    buf=buf at entry=0x876b08 <current_req+8> "CANCEL sip:[redacted]@[redacted];transport=TLS SIP/2.0\r\nVia: SIP/2.0/TLS [redacted]:51370;rport;branch=z9hG4bKPj4rfzZOxA8KjWkF2jX1wz5YLxkLNUxPsK\r\nMax-Forwards: 69\r\nF"..., len=len at entry=423, rcv_info=rcv_info at entry=0x7fffc506a590)
    at receive.c:172
#25 0x00000000004d949f in tcp_read_req (con=con at entry=0x7f7cb2ff3bf8, 
    bytes_read=bytes_read at entry=0x7fffc506a670) at tcp_read.c:697
#26 0x00000000004da4e6 in handle_io (fm=0x7f7d3351f038, idx=idx at entry=-1, 
    event_type=1) at tcp_read.c:1033
#27 0x00000000004dc674 in io_wait_loop_epoll (h=<optimized out>, 
    t=<optimized out>, repeat=<optimized out>) at io_wait.h:845
#28 tcp_receive_loop (unix_sock=<optimized out>) at tcp_read.c:1141
#29 0x00000000004d4de4 in tcp_init_children (
    chd_rank=chd_rank at entry=0x836a50 <chd_rank.22745>, 
    startup_done=startup_done at entry=0x0) at tcp_main.c:2373
#30 0x0000000000416e24 in main_loop () at main.c:1024
#31 main (argc=<optimized out>, argv=<optimized out>) at main.c:1634
```
We're using TLS here, but the same thing happens when using unencrypted TCP.

Unfortunately, I've been unable to reproduce this under controlled conditions; it just seems to happen randomly in production.

---
Reply to this email directly or view it on GitHub:
https://github.com/OpenSIPS/opensips/issues/410
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/devel/attachments/20150205/4b3449fb/attachment.htm>


More information about the Devel mailing list