[OpenSIPS-Devel] Crash in latest opensips

Dan Pascu dan at ag-projects.com
Mon Jul 1 07:20:39 EDT 2019


This crash happens with the latest git source as of today, at version b367b7c5d.

I had this crash happen twice in a few hours after running the same environment for more than a month without any change, so I guess it's not easy to reproduce.

The first time it crashed I got these entries in syslog for the process that crashed:

----------------------
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:send_fd: sendmsg failed on 172: Bad file descriptor
Jun 30 02:29:10 node15 ./opensips[3573]: ERROR:core:send2worker: send_fd failed
Jun 30 02:29:10 node15 ./opensips[3573]: ERROR:core:handle_tcpconn_ev: no TCP workers available
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del: [TCP_main] check failed after successful fd del (fd=221,flags=0, sflags=1) over map (fd=-1,type=0,data=(nil),flags=536870912) erase=1
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add: [TCP_main] check failed after successful fd add (fd=222,type=19,data=0x7ff16c0da018,flags=1) already=0
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del: [TCP_main] check failed after successful fd del (fd=222,flags=16, sflags=3) over map (fd=-1,type=0,data=(nil),flags=0) erase=1
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add: [TCP_main] check failed after failed fd add (fd=220,type=19,data=0x7ff16c1042a8,flags=1) already=-1
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add: [TCP_main] check failed after successful fd add (fd=222,type=19,data=0x7ff16c1042a8,flags=1) already=0
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del: [TCP_main] check failed after successful fd del (fd=222,flags=16, sflags=3) over map (fd=-1,type=0,data=(nil),flags=0) erase=1
Jun 30 02:29:14 node15 ./opensips[3573]: CRITICAL:core:fm_free_dbg: freeing already freed shm pointer (0x7ff16c1042a8), first free: net/net_tcp.c: _tcpconn_rm(747) - aborting!
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:handle_sigs: child process 3573 exited by a signal 6
----------------------

and the backtrace reads:

(gdb) bt
#0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ff18de72535 in __GI_abort () at abort.c:79
#2  0x000056070c29962f in fm_free_dbg (fm=<optimized out>, p=<optimized out>, file=<optimized out>,
    func=<optimized out>, line=<optimized out>) at mem/f_malloc_dyn.h:221
#3  0x000056070c4b34ee in _shm_free (file=0x56070c6485a0 "net/net_tcp.c", line=747, function=<synthetic pointer>,
    ptr=0x7ff16c1042a8) at net/../mem/shm_mem.h:448
#4  _tcpconn_rm (c=c at entry=0x7ff16c1042a8) at net/net_tcp.c:747
#5  0x000056070c4d221e in __tcpconn_lifetime (force=0) at net/net_tcp.c:1608
#6  tcp_main_server () at net/net_tcp.c:1695
#7  0x000056070c4dc41a in tcp_start_listener () at net/net_tcp.c:2114
#8  0x000056070bfb5eba in main_loop () at main.c:813
#9  main (argc=<optimized out>, argv=<optimized out>) at main.c:1480


At the same time a second process crashed with the following in syslog (this looks like the main process trying to do cleanup):

-----------------------
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:handle_sigs: child process 3573 exited by a signal 6
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:handle_sigs: core was generated
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:handle_sigs: terminating due to SIGCHLD
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 1(3554) [JSON-RPC sender] terminated, still waiting for 15 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 2(3555) [MI Datagram] terminated, still waiting for 14 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 3(3556) [MI FIFO] terminated, still waiting for 13 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 4(3557) [time_keeper] terminated, still waiting for 12 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 5(3558) [timer] terminated, still waiting for 11 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 12(3565) [TCP receiver] terminated, still waiting for 10 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 13(3566) [TCP receiver] terminated, still waiting for 9 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 11(3564) [TCP receiver] terminated, still waiting for 8 more
Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 15(3570) [TCP receiver] terminated, still waiting for 7 more
Jun 30 02:29:19 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 9(3562) [SIP receiver udp:81.23.228.160:5060] terminated, still waiting for 6 more
Jun 30 02:29:21 node15 ./opensips[3552]: INFO:core:cleanup: cleanup
Jun 30 02:29:21 node15 ./opensips[3552]: INFO:db_mysql:switch_state_to_disconnected: disconnect event for 0x60b00000b9e0
Jun 30 02:29:21 node15 ./opensips[3552]: INFO:db_mysql:reset_all_statements: resetting all statements on connection: (0x60d000005b40) 0x60b00000b9e0
Jun 30 02:29:21 node15 ./opensips[3552]: INFO:db_mysql:connect_with_retry: re-connected successful for 0x60b00000b9e0
Jun 30 02:29:21 node15 ./opensips[3552]: NOTICE:presence:destroy: destroy module ...
Jun 30 02:29:21 node15 ./opensips[3552]: NOTICE:event_jsonrpc:destroy: destroy module ...
Jun 30 02:29:21 node15 ./opensips[3552]: CRITICAL:core:fm_free_dbg: freeing already freed shm pointer (0x7ff16c10a6f0), first free: net/net_tcp.c: _tcpconn_rm(747) - aborting!
-----------------------

and this backtrace:

(gdb) bt
#0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ff18de72535 in __GI_abort () at abort.c:79
#2  0x000056070c29962f in fm_free_dbg (fm=<optimized out>, p=<optimized out>, file=<optimized out>,
    func=<optimized out>, line=<optimized out>) at mem/f_malloc_dyn.h:221
#3  0x000056070c4b34ee in _shm_free (file=0x56070c6485a0 "net/net_tcp.c", line=747, function=<synthetic pointer>,
    ptr=0x7ff16c10a6f0) at net/../mem/shm_mem.h:448
#4  _tcpconn_rm (c=0x7ff16c10a6f0) at net/net_tcp.c:747
#5  0x000056070c4d5f12 in __tcpconn_lifetime (force=1) at net/net_tcp.c:1608
#6  tcp_destroy () at net/net_tcp.c:1811
#7  0x000056070c1e551a in cleanup (show_status=show_status at entry=1) at main.c:360
#8  0x000056070c1e69aa in shutdown_opensips (status=status at entry=134) at main.c:522
#9  0x000056070c1e7ca0 in handle_sigs () at main.c:605
#10 0x000056070bfb65fe in main_loop () at main.c:867
#11 main (argc=<optimized out>, argv=<optimized out>) at main.c:1480


The second time syslog didn't have much detail except the critical double free error. This time it also had 2 crashed processes. The first one had this:

------------------------
Jun 30 03:11:47 node15 ./opensips[18284]: CRITICAL:core:fm_free_dbg: freeing already freed shm pointer (0x7ff8ed303b38), first free: net/net_tcp.c: _tcpconn_rm(747) - aborting!
------------------------

and backtrace:

(gdb) bt
#0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ff90f1f2535 in __GI_abort () at abort.c:79
#2  0x00005584676ff62f in fm_free_dbg (fm=<optimized out>, p=<optimized out>, file=<optimized out>,
    func=<optimized out>, line=<optimized out>) at mem/f_malloc_dyn.h:221
#3  0x00005584679194ee in _shm_free (file=0x558467aae5a0 "net/net_tcp.c", line=747, function=<synthetic pointer>,
    ptr=0x7ff8ed303b38) at net/../mem/shm_mem.h:448
#4  _tcpconn_rm (c=c at entry=0x7ff8ed303b38) at net/net_tcp.c:747
#5  0x0000558467919d4a in tcpconn_destroy (tcpconn=0x7ff8ed303b38) at net/net_tcp.c:1005
#6  0x000055846792a707 in handle_tcp_worker (fd_i=<optimized out>, tcp_c=0x6160000039c8) at net/net_tcp.c:1335
#7  handle_io (fm=<optimized out>, idx=idx at entry=22, event_type=event_type at entry=1) at net/net_tcp.c:1525
#8  0x0000558467937831 in io_wait_loop_epoll (h=<optimized out>, t=5, repeat=0) at net/../io_wait_loop.h:280
#9  tcp_main_server () at net/net_tcp.c:1695
#10 0x000055846794241a in tcp_start_listener () at net/net_tcp.c:2114
#11 0x000055846741beba in main_loop () at main.c:813
#12 main (argc=<optimized out>, argv=<optimized out>) at main.c:1480


and the main process had:

---------------------
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:handle_sigs: child process 18284 exited by a signal 6
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:handle_sigs: core was generated
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:handle_sigs: terminating due to SIGCHLD
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 2(18258) [MI Datagram] terminated, still waiting for 15 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 1(18257) [JSON-RPC sender] terminated, still waiting for 14 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 12(18273) [TCP receiver] terminated, still waiting for 13 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 14(18278) [TCP receiver] terminated, still waiting for 12 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 3(18259) [MI FIFO] terminated, still waiting for 11 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 4(18260) [time_keeper] terminated, still waiting for 10 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 13(18276) [TCP receiver] terminated, still waiting for 9 more
Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 5(18261) [timer] terminated, still waiting for 8 more
Jun 30 03:11:54 node15 ./opensips[18255]: INFO:core:cleanup: cleanup
Jun 30 03:11:56 node15 ./opensips[18255]: INFO:db_mysql:switch_state_to_disconnected: disconnect event for 0x60b00002fed0
Jun 30 03:11:56 node15 ./opensips[18255]: INFO:db_mysql:reset_all_statements: resetting all statements on connection: (0x60d000005b40) 0x60b00002fed0
Jun 30 03:11:56 node15 ./opensips[18255]: INFO:db_mysql:connect_with_retry: re-connected successful for 0x60b00002fed0
Jun 30 03:11:56 node15 ./opensips[18255]: INFO:db_mysql:db_mysql_do_prepared_query: reconnected to mysql server -> re-init the statement
Jun 30 03:11:56 node15 ./opensips[18255]: NOTICE:presence:destroy: destroy module ...
Jun 30 03:11:56 node15 ./opensips[18255]: NOTICE:event_jsonrpc:destroy: destroy module ...
Jun 30 03:11:56 node15 ./opensips[18255]: CRITICAL:core:fm_free_dbg: freeing already freed shm pointer (0x7ff8ed31b098), first free: net/net_tcp.c: _tcpconn_rm(747) - aborting!
---------------------

and backtrace:

(gdb) bt
#0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ff90f1f2535 in __GI_abort () at abort.c:79
#2  0x00005584676ff62f in fm_free_dbg (fm=<optimized out>, p=<optimized out>, file=<optimized out>,
    func=<optimized out>, line=<optimized out>) at mem/f_malloc_dyn.h:221
#3  0x00005584679194ee in _shm_free (file=0x558467aae5a0 "net/net_tcp.c", line=747, function=<synthetic pointer>,
    ptr=0x7ff8ed31b098) at net/../mem/shm_mem.h:448
#4  _tcpconn_rm (c=0x7ff8ed31b098) at net/net_tcp.c:747
#5  0x000055846793bf12 in __tcpconn_lifetime (force=1) at net/net_tcp.c:1608
#6  tcp_destroy () at net/net_tcp.c:1811
#7  0x000055846764b51a in cleanup (show_status=show_status at entry=1) at main.c:360
#8  0x000055846764c9aa in shutdown_opensips (status=status at entry=134) at main.c:522
#9  0x000055846764dca0 in handle_sigs () at main.c:605
#10 0x000055846741c5fe in main_loop () at main.c:867
#11 main (argc=<optimized out>, argv=<optimized out>) at main.c:1480

--
Dan







More information about the Devel mailing list