[OpenSIPS-Devel] [ opensips-Bugs-3390207 ] Segfault while freeing dialog

SourceForge.net noreply at sourceforge.net
Fri Aug 12 12:54:16 CEST 2011


Bugs item #3390207, was opened at 2011-08-12 00:25
Message generated for change (Comment added) made by shari_786pk
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=1086410&aid=3390207&group_id=232389

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: modules
Group: trunk
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Muhammad Shahzad (shari_786pk)
Assigned to: Nobody/Anonymous (nobody)
Summary: Segfault while freeing dialog

Initial Comment:
Hi,

OpenSIPs crashes randomly while processing Bye reply, when destroying / freeing the associated dialog. So far, this has happened 4 times in last 2 days on one of my production servers. There are two core dumps generated upon each crash,

BT on first core dump gives following,

Core was generated by `/usr/local/sbin/opensips -P /var/run/opensips.pid -m 512 -u root -g root'.
Program terminated with signal 11, Segmentation fault.
#0  0x97761500 in ?? ()
(gdb) bt
#0  0x97761500 in ?? ()
#1  0xb723abd3 in free_dlg_dlg (dlg=<value optimized out>) at dlg_hash.c:163
#2  0xb7243f5f in destroy_dlg (dlg=0x979d04a0, cnt=1) at dlg_hash.c:225
#3  unref_dlg (dlg=0x979d04a0, cnt=1) at dlg_hash.c:742
#4  0xb7247a42 in dual_bye_event (dlg=0x979d04a0, req=<value optimized out>, extra_unref=1) at dlg_req_within.c:274
#5  0xb72491c0 in bye_reply_cb (t=0x976af43c, type=256, ps=0xbfddce54) at dlg_req_within.c:294
#6  0xb728e657 in run_trans_callbacks (type=256, trans=0x976af43c, req=0x0, rpl=0x82d241c, code=200) at t_hooks.c:212
#7  0xb72a6d93 in local_reply (t=0x976af43c, p_msg=0x82d241c, branch=0, msg_status=200, cancel_bitmap=0xbfddcf88) at t_reply.c:1358
#8  0xb72a82c2 in reply_received (p_msg=0x82d241c) at t_reply.c:1503
#9  0x0806c913 in forward_reply (msg=0x82d241c) at forward.c:568
#10 0x080a4d32 in receive_msg (
    buf=0x81d7480 "SIP/2.0 200 OK\r\nVia: SIP/2.0/UDP 77.66.2.136;branch=z9hG4bK2aa.63be8597.0;received=77.66.2.136\r\nFrom: <sip:+642102908458 at 77.66.2.136>;tag=as2bde889d\r\nTo: <sip:01100165022863605 at 77.66.2.136:5060;user=p"..., len=428, rcv_info=<value optimized out>) at receive.c:203
#11 0x080fbd5e in udp_rcv_loop () at udp_server.c:419
#12 0x08074d28 in main_loop (argc=9, argv=0xbfddd2f4) at main.c:879
#13 main (argc=9, argv=0xbfddd2f4) at main.c:1497


BT on second core dump gives following,

Core was generated by `/usr/local/sbin/opensips -P /var/run/opensips.pid -m 512 -u root -g root'.
Program terminated with signal 6, Aborted.
#0  0xb78ab424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb78ab424 in __kernel_vsyscall ()
#1  0xb75c3751 in raise () from /lib/i686/cmov/libc.so.6
#2  0xb75c6b82 in abort () from /lib/i686/cmov/libc.so.6
#3  0x08070d64 in sig_alarm_abort (signo=14) at main.c:435
#4  <signal handler called>
#5  get_lock (ticks=0, param=0x0) at ../../evi/../fastlock.h:178
#6  dialog_update_db (ticks=0, param=0x0) at dlg_db_handler.c:972
#7  0xb72249e1 in mod_destroy () at dialog.c:755
#8  0x080ce108 in destroy_modules () at sr_module.c:371
#9  0x08070858 in cleanup (show_status=1) at main.c:344
#10 0x080714a9 in handle_sigs () at main.c:545
#11 0x08075245 in main_loop (argc=9, argv=0xbfddd2f4) at main.c:990
#12 main (argc=9, argv=0xbfddd2f4) at main.c:1497

The most disturbing part of these crashes for me is that it causes all running calls to hang up. Previously in case of OpenSIPs crash running calls use to remain unaffected.

I am using SVN trunk revision 8215, and have create_dialog("BPp") on very first INVITE in my dial plan. 

Feel free to ask for more info as needed.

Thank you.


----------------------------------------------------------------------

>Comment By: Muhammad Shahzad (shari_786pk)
Date: 2011-08-12 15:54

Message:
well, as far as i remember 8201 never crashed on its own, however, i did
observed crash a couple of time while trying to stop the service (for
maintenance reasons), but ignored them. The back trace of those crashes
showed mysql update failed while sync dialog run time info to db. I didn't
saved those core dumps to can't give much details.

Anyways, my production server is now on 8201 and let see if it crashes, i
will send you its BT.

----------------------------------------------------------------------

Comment By: Vladut-Stefan Paiu (vladut-paiu)
Date: 2011-08-12 15:37

Message:
Hello again,

This second backtrace seems somehow corrupt. 

You are saying the 8201 is ok, no crashes as the reported ones ? While
8215 shows these problems ?

----------------------------------------------------------------------

Comment By: Muhammad Shahzad (shari_786pk)
Date: 2011-08-12 14:35

Message:
i updated the trunk to revision 8242, and now it crashes due to some memory
leak, may be its a different bug. Anyway, here is BT.

Core was generated by `/usr/local/sbin/opensips -P /var/run/opensips.pid
-m 512 -u root -g root'.
Program terminated with signal 6, Aborted.
#0  0xb7706424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7706424 in __kernel_vsyscall ()
#1  0xb741e751 in raise () from /lib/i686/cmov/libc.so.6
#2  0xb7421b82 in abort () from /lib/i686/cmov/libc.so.6
#3  0x0810344e in qm_insert_free (qm=0xb70b0337, p=0x970dac6c,
file=0xb70b0337 "L:dialog:%s: extract_ftc_hdrs ok but no from extracted :
[%.*s]\n",
    func=0xb70b0dbb "g:%s: no more shm\n", line=171) at
mem/q_malloc.c:169
#4  qm_free (qm=0xb70b0337, p=0x970dac6c, file=0xb70b0337 "L:dialog:%s:
extract_ftc_hdrs ok but no from extracted : [%.*s]\n", func=0xb70b0dbb
"g:%s: no more shm\n",
    line=171) at mem/q_malloc.c:491
#5  0xb7095cf6 in dlg_onreply (dialog=0x800) at dlg_handlers.c:477
#6  unreference_dialog_create (dialog=0x800) at dlg_handlers.c:776
#7  0xb709ef5f in unref_dlg (dlg=0x970da634, cnt=1) at dlg_hash.c:739
#8  0xb70a2a42 in new_dlg_profile (profiles=0x1 <Address 0x1 out of
bounds>, has_value=3070347656) at dlg_profile.c:278
#9  add_profile_definitions (profiles=0x1 <Address 0x1 out of bounds>,
has_value=3070347656) at dlg_profile.c:168
#10 0xb70a41c0 in send_leg_msg (dlg=0x979541a8, method=0x100,
src_leg=-1078038796, dst_leg=0, hdrs=0x97369734, body=0x979ba664,
func=0x81a9e68 <debug>,
    param=0x972ac048, release=0) at dlg_req_within.c:494
#11 0xb70e9657 in run_trans_callbacks (type=256, trans=0x979541a8,
req=0x0, rpl=0xffffffff, code=408) at t_hooks.c:212
#12 0xb7101d93 in local_reply (t=0x979541a8, p_msg=0xffffffff, branch=0,
msg_status=408, cancel_bitmap=0xbfbe702c) at t_reply.c:1358
#13 0xb70ec0a3 in fake_reply (ticks=70, attr=0x0) at timer.c:253
#14 final_response_handler (ticks=70, attr=0x0) at timer.c:364
#15 timer_routine (ticks=70, attr=0x0) at timer.c:1005
#16 0x080e42fa in utimer_ticker () at timer.c:369
#17 run_timer_process () at timer.c:407
#18 start_timer_processes () at timer.c:522
#19 0x080751bd in main_loop (argc=9, argv=0xbfbe71e4) at main.c:936
#20 main (argc=9, argv=0xbfbe71e4) at main.c:1497

It has crashes 3 times in last six hours, so i am think of reverting it to
last stable revision 8201. :(


----------------------------------------------------------------------

Comment By: Muhammad Shahzad (shari_786pk)
Date: 2011-08-12 14:23

Message:
I am afraid i won't be able to do that since crashes are random and the
server where its happening have a lot of production traffic (1500+ sip
registrations, 300+ calls etc.) on it.

I do have the core dump though, but their sizes are around 500+MB.


----------------------------------------------------------------------

Comment By: Vladut-Stefan Paiu (vladut-paiu)
Date: 2011-08-12 13:27

Message:
Hello Muhammad,

Is it possible that you also attach an OpenSIPS log in full debug for a
call where this crash happens ? I have an idea about the crash, but I would
need a log in debug=4 to be sure.

Thanks and Regards,
Vlad

----------------------------------------------------------------------

Comment By: Muhammad Shahzad (shari_786pk)
Date: 2011-08-12 04:50

Message:
Another crash, this time i was watching opensips logs and got this
additional info,


Aug 11 23:46:05 nghw-2674 osip-service[1803]: CRITICAL:core:qm_free:
freeing already freed pointer, first free: dlg_hash.c: free_dlg_dlg(171) -
aborting 
Aug 11 23:46:21 nghw-2674 osip-service[1838]: CRITICAL:core:receive_fd:
EOF on 37 
Aug 11 23:46:21 nghw-2674 osip-service[1779]: NOTICE:presence:destroy:
destroy module ... 



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=1086410&aid=3390207&group_id=232389



More information about the Devel mailing list