[OpenSIPS-Devel] Crash in latest opensips

Bogdan-Andrei Iancu bogdan at opensips.org
Mon Jul 8 07:45:12 EDT 2019


Hi Dan,

Please open a bug report on the GITHUB tracker.

Thanks & Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   https://www.opensips-solutions.com
OpenSIPS Summit 2019
   https://www.opensips.org/events/Summit-2019Amsterdam/

On 07/01/2019 02:20 PM, Dan Pascu wrote:
> This crash happens with the latest git source as of today, at version b367b7c5d.
>
> I had this crash happen twice in a few hours after running the same environment for more than a month without any change, so I guess it's not easy to reproduce.
>
> The first time it crashed I got these entries in syslog for the process that crashed:
>
> ----------------------
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:send_fd: sendmsg failed on 172: Bad file descriptor
> Jun 30 02:29:10 node15 ./opensips[3573]: ERROR:core:send2worker: send_fd failed
> Jun 30 02:29:10 node15 ./opensips[3573]: ERROR:core:handle_tcpconn_ev: no TCP workers available
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del: [TCP_main] check failed after successful fd del (fd=221,flags=0, sflags=1) over map (fd=-1,type=0,data=(nil),flags=536870912) erase=1
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add: [TCP_main] check failed after successful fd add (fd=222,type=19,data=0x7ff16c0da018,flags=1) already=0
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del: [TCP_main] check failed after successful fd del (fd=222,flags=16, sflags=3) over map (fd=-1,type=0,data=(nil),flags=0) erase=1
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add: [TCP_main] check failed after failed fd add (fd=220,type=19,data=0x7ff16c1042a8,flags=1) already=-1
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add:
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_add: [TCP_main] check failed after successful fd add (fd=222,type=19,data=0x7ff16c1042a8,flags=1) already=0
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del:
> Jun 30 02:29:10 node15 ./opensips[3573]: CRITICAL:core:io_watch_del: [TCP_main] check failed after successful fd del (fd=222,flags=16, sflags=3) over map (fd=-1,type=0,data=(nil),flags=0) erase=1
> Jun 30 02:29:14 node15 ./opensips[3573]: CRITICAL:core:fm_free_dbg: freeing already freed shm pointer (0x7ff16c1042a8), first free: net/net_tcp.c: _tcpconn_rm(747) - aborting!
> Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:handle_sigs: child process 3573 exited by a signal 6
> ----------------------
>
> and the backtrace reads:
>
> (gdb) bt
> #0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007ff18de72535 in __GI_abort () at abort.c:79
> #2  0x000056070c29962f in fm_free_dbg (fm=<optimized out>, p=<optimized out>, file=<optimized out>,
>      func=<optimized out>, line=<optimized out>) at mem/f_malloc_dyn.h:221
> #3  0x000056070c4b34ee in _shm_free (file=0x56070c6485a0 "net/net_tcp.c", line=747, function=<synthetic pointer>,
>      ptr=0x7ff16c1042a8) at net/../mem/shm_mem.h:448
> #4  _tcpconn_rm (c=c at entry=0x7ff16c1042a8) at net/net_tcp.c:747
> #5  0x000056070c4d221e in __tcpconn_lifetime (force=0) at net/net_tcp.c:1608
> #6  tcp_main_server () at net/net_tcp.c:1695
> #7  0x000056070c4dc41a in tcp_start_listener () at net/net_tcp.c:2114
> #8  0x000056070bfb5eba in main_loop () at main.c:813
> #9  main (argc=<optimized out>, argv=<optimized out>) at main.c:1480
>
>
> At the same time a second process crashed with the following in syslog (this looks like the main process trying to do cleanup):
>
> -----------------------
> Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:handle_sigs: child process 3573 exited by a signal 6
> Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:handle_sigs: core was generated
> Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:handle_sigs: terminating due to SIGCHLD
> Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 1(3554) [JSON-RPC sender] terminated, still waiting for 15 more
> Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 2(3555) [MI Datagram] terminated, still waiting for 14 more
> Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 3(3556) [MI FIFO] terminated, still waiting for 13 more
> Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 4(3557) [time_keeper] terminated, still waiting for 12 more
> Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 5(3558) [timer] terminated, still waiting for 11 more
> Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 12(3565) [TCP receiver] terminated, still waiting for 10 more
> Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 13(3566) [TCP receiver] terminated, still waiting for 9 more
> Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 11(3564) [TCP receiver] terminated, still waiting for 8 more
> Jun 30 02:29:16 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 15(3570) [TCP receiver] terminated, still waiting for 7 more
> Jun 30 02:29:19 node15 ./opensips[3552]: INFO:core:shutdown_opensips: process 9(3562) [SIP receiver udp:81.23.228.160:5060] terminated, still waiting for 6 more
> Jun 30 02:29:21 node15 ./opensips[3552]: INFO:core:cleanup: cleanup
> Jun 30 02:29:21 node15 ./opensips[3552]: INFO:db_mysql:switch_state_to_disconnected: disconnect event for 0x60b00000b9e0
> Jun 30 02:29:21 node15 ./opensips[3552]: INFO:db_mysql:reset_all_statements: resetting all statements on connection: (0x60d000005b40) 0x60b00000b9e0
> Jun 30 02:29:21 node15 ./opensips[3552]: INFO:db_mysql:connect_with_retry: re-connected successful for 0x60b00000b9e0
> Jun 30 02:29:21 node15 ./opensips[3552]: NOTICE:presence:destroy: destroy module ...
> Jun 30 02:29:21 node15 ./opensips[3552]: NOTICE:event_jsonrpc:destroy: destroy module ...
> Jun 30 02:29:21 node15 ./opensips[3552]: CRITICAL:core:fm_free_dbg: freeing already freed shm pointer (0x7ff16c10a6f0), first free: net/net_tcp.c: _tcpconn_rm(747) - aborting!
> -----------------------
>
> and this backtrace:
>
> (gdb) bt
> #0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007ff18de72535 in __GI_abort () at abort.c:79
> #2  0x000056070c29962f in fm_free_dbg (fm=<optimized out>, p=<optimized out>, file=<optimized out>,
>      func=<optimized out>, line=<optimized out>) at mem/f_malloc_dyn.h:221
> #3  0x000056070c4b34ee in _shm_free (file=0x56070c6485a0 "net/net_tcp.c", line=747, function=<synthetic pointer>,
>      ptr=0x7ff16c10a6f0) at net/../mem/shm_mem.h:448
> #4  _tcpconn_rm (c=0x7ff16c10a6f0) at net/net_tcp.c:747
> #5  0x000056070c4d5f12 in __tcpconn_lifetime (force=1) at net/net_tcp.c:1608
> #6  tcp_destroy () at net/net_tcp.c:1811
> #7  0x000056070c1e551a in cleanup (show_status=show_status at entry=1) at main.c:360
> #8  0x000056070c1e69aa in shutdown_opensips (status=status at entry=134) at main.c:522
> #9  0x000056070c1e7ca0 in handle_sigs () at main.c:605
> #10 0x000056070bfb65fe in main_loop () at main.c:867
> #11 main (argc=<optimized out>, argv=<optimized out>) at main.c:1480
>
>
> The second time syslog didn't have much detail except the critical double free error. This time it also had 2 crashed processes. The first one had this:
>
> ------------------------
> Jun 30 03:11:47 node15 ./opensips[18284]: CRITICAL:core:fm_free_dbg: freeing already freed shm pointer (0x7ff8ed303b38), first free: net/net_tcp.c: _tcpconn_rm(747) - aborting!
> ------------------------
>
> and backtrace:
>
> (gdb) bt
> #0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007ff90f1f2535 in __GI_abort () at abort.c:79
> #2  0x00005584676ff62f in fm_free_dbg (fm=<optimized out>, p=<optimized out>, file=<optimized out>,
>      func=<optimized out>, line=<optimized out>) at mem/f_malloc_dyn.h:221
> #3  0x00005584679194ee in _shm_free (file=0x558467aae5a0 "net/net_tcp.c", line=747, function=<synthetic pointer>,
>      ptr=0x7ff8ed303b38) at net/../mem/shm_mem.h:448
> #4  _tcpconn_rm (c=c at entry=0x7ff8ed303b38) at net/net_tcp.c:747
> #5  0x0000558467919d4a in tcpconn_destroy (tcpconn=0x7ff8ed303b38) at net/net_tcp.c:1005
> #6  0x000055846792a707 in handle_tcp_worker (fd_i=<optimized out>, tcp_c=0x6160000039c8) at net/net_tcp.c:1335
> #7  handle_io (fm=<optimized out>, idx=idx at entry=22, event_type=event_type at entry=1) at net/net_tcp.c:1525
> #8  0x0000558467937831 in io_wait_loop_epoll (h=<optimized out>, t=5, repeat=0) at net/../io_wait_loop.h:280
> #9  tcp_main_server () at net/net_tcp.c:1695
> #10 0x000055846794241a in tcp_start_listener () at net/net_tcp.c:2114
> #11 0x000055846741beba in main_loop () at main.c:813
> #12 main (argc=<optimized out>, argv=<optimized out>) at main.c:1480
>
>
> and the main process had:
>
> ---------------------
> Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:handle_sigs: child process 18284 exited by a signal 6
> Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:handle_sigs: core was generated
> Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:handle_sigs: terminating due to SIGCHLD
> Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 2(18258) [MI Datagram] terminated, still waiting for 15 more
> Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 1(18257) [JSON-RPC sender] terminated, still waiting for 14 more
> Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 12(18273) [TCP receiver] terminated, still waiting for 13 more
> Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 14(18278) [TCP receiver] terminated, still waiting for 12 more
> Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 3(18259) [MI FIFO] terminated, still waiting for 11 more
> Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 4(18260) [time_keeper] terminated, still waiting for 10 more
> Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 13(18276) [TCP receiver] terminated, still waiting for 9 more
> Jun 30 03:11:49 node15 ./opensips[18255]: INFO:core:shutdown_opensips: process 5(18261) [timer] terminated, still waiting for 8 more
> Jun 30 03:11:54 node15 ./opensips[18255]: INFO:core:cleanup: cleanup
> Jun 30 03:11:56 node15 ./opensips[18255]: INFO:db_mysql:switch_state_to_disconnected: disconnect event for 0x60b00002fed0
> Jun 30 03:11:56 node15 ./opensips[18255]: INFO:db_mysql:reset_all_statements: resetting all statements on connection: (0x60d000005b40) 0x60b00002fed0
> Jun 30 03:11:56 node15 ./opensips[18255]: INFO:db_mysql:connect_with_retry: re-connected successful for 0x60b00002fed0
> Jun 30 03:11:56 node15 ./opensips[18255]: INFO:db_mysql:db_mysql_do_prepared_query: reconnected to mysql server -> re-init the statement
> Jun 30 03:11:56 node15 ./opensips[18255]: NOTICE:presence:destroy: destroy module ...
> Jun 30 03:11:56 node15 ./opensips[18255]: NOTICE:event_jsonrpc:destroy: destroy module ...
> Jun 30 03:11:56 node15 ./opensips[18255]: CRITICAL:core:fm_free_dbg: freeing already freed shm pointer (0x7ff8ed31b098), first free: net/net_tcp.c: _tcpconn_rm(747) - aborting!
> ---------------------
>
> and backtrace:
>
> (gdb) bt
> #0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007ff90f1f2535 in __GI_abort () at abort.c:79
> #2  0x00005584676ff62f in fm_free_dbg (fm=<optimized out>, p=<optimized out>, file=<optimized out>,
>      func=<optimized out>, line=<optimized out>) at mem/f_malloc_dyn.h:221
> #3  0x00005584679194ee in _shm_free (file=0x558467aae5a0 "net/net_tcp.c", line=747, function=<synthetic pointer>,
>      ptr=0x7ff8ed31b098) at net/../mem/shm_mem.h:448
> #4  _tcpconn_rm (c=0x7ff8ed31b098) at net/net_tcp.c:747
> #5  0x000055846793bf12 in __tcpconn_lifetime (force=1) at net/net_tcp.c:1608
> #6  tcp_destroy () at net/net_tcp.c:1811
> #7  0x000055846764b51a in cleanup (show_status=show_status at entry=1) at main.c:360
> #8  0x000055846764c9aa in shutdown_opensips (status=status at entry=134) at main.c:522
> #9  0x000055846764dca0 in handle_sigs () at main.c:605
> #10 0x000055846741c5fe in main_loop () at main.c:867
> #11 main (argc=<optimized out>, argv=<optimized out>) at main.c:1480
>
> --
> Dan
>
>
>
>
>
> _______________________________________________
> Devel mailing list
> Devel at lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/devel




More information about the Devel mailing list