[OpenSIPS-Devel] [ opensips-Bugs-3182321 ] segfault in f_malloc.c, chains from dialog module
SourceForge.net
noreply at sourceforge.net
Fri Mar 4 16:05:36 CET 2011
Bugs item #3182321, was opened at 2011-02-15 09:54
Message generated for change (Comment added) made by denodaeus
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=1086410&aid=3182321&group_id=232389
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: modules
Group: 1.6.x
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: Robert Smith (denodaeus)
Assigned to: Bogdan-Andrei Iancu (bogdan_iancu)
Summary: segfault in f_malloc.c, chains from dialog module
Initial Comment:
We've had several crashes now, R7081 (this is pretty much 1.6.3 pulled the day of release with a couple of bug fix patches in textops and loadbalancer applied):
It seems to be fairly consistently happening around free_dlg_dlg in dlg_hash.c which is triggering it -- we have several cores showing this behavior.
What are the next steps for debugging this? This is on a production machine and we haven't been able to reproduce it elsewhere, so we are somewhat sensitive to load issues while trying to troubleshoot this.
Thanks.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004a2bcf in fm_insert_free (qm=0x2b875eba4000, p=<value optimized out>) at mem/f_malloc.c:155
155 (*f)->prev = &(frag->u.nxt_free);
(gdb) bt full
#0 0x00000000004a2bcf in fm_insert_free (qm=0x2b875eba4000, p=<value optimized out>) at mem/f_malloc.c:155
f = 0x2b875eba4070
hash = 3
#1 fm_free (qm=0x2b875eba4000, p=<value optimized out>) at mem/f_malloc.c:460
f = 0x2b875f41be40
n = <value optimized out>
__FUNCTION__ = "fm_free"
#2 0x00002b875e14dcde in free_dlg_dlg () at dlg_hash.c:168
i = 0
#3 destroy_dlg_table () at dlg_hash.c:234
dlg = 0x2b875f319a80
i = 5
#4 0x00002b875e13a651 in mod_destroy () at dialog.c:742
No locals.
#5 0x0000000000476ff4 in destroy_modules () at sr_module.c:370
t = 0x785328
foo = 0x785258
#6 0x00000000004252e0 in cleanup (show_status=1) at main.c:336
No locals.
#7 0x00000000004261fb in handle_sigs () at main.c:533
chld = 0
chld_status = 139
i = <value optimized out>
do_exit = 1
__FUNCTION__ = "handle_sigs"
#8 0x000000000042a5a9 in main_loop (argc=3, argv=0x4) at main.c:913
i = 4
pid = <value optimized out>
si = 0x0
startup_done = 0x0
chd_rank = 16
__FUNCTION__ = "main_loop"
#9 main (argc=3, argv=0x4) at main.c:1388
cfg_log_stderr = <value optimized out>
cfg_stream = 0x831e010
c = <value optimized out>
r = <value optimized out>
tmp = 0x4fc447 "H\215\005\262\333#"
tmp_len = <value optimized out>
port = <value optimized out>
proto = <value optimized out>
ret = <value optimized out>
seed = 586369539
rfd = <value optimized out>
__FUNCTION__ = "main"
(gdb)
(gdb) frame 0
#0 0x00000000004a2bcf in fm_insert_free (qm=0x2b875eba4000, p=<value optimized out>) at mem/f_malloc.c:155
155 (*f)->prev = &(frag->u.nxt_free);
(gdb) list
150
151 /*insert it here*/
152 frag->prev = f;
153 frag->u.nxt_free=*f;
154 if( *f )
155 (*f)->prev = &(frag->u.nxt_free);
156
157 *f=frag;
158 qm->free_hash[hash].no++;
159 free_plus(qm , frag->size);
(gdb) info locals
f = 0x2b875eba4070
hash = 3
(gdb) print *f
$1 = (struct fm_frag *) 0x2b865f235290
(gdb) print *(*f)
Cannot access memory at address 0x2b865f235290
(gdb) frame 1
#1 fm_free (qm=0x2b875eba4000, p=<value optimized out>) at mem/f_malloc.c:460
460 fm_insert_free(qm, f);
(gdb) print f
$2 = (struct fm_frag *) 0x2b875f41be40
(gdb) list
455 goto join;
456 }
457
458 no_join:
459
460 fm_insert_free(qm, f);
461 }
462
463
464 #ifdef DBG_F_MALLOC
(gdb)
And a second crash here a few hours after restarting from above:
Core was generated by `/usr/local/opensips/sbin/opensips -u opensips'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004a2bcf in fm_insert_free (qm=0x2aca30c14000, p=<value optimized out>) at mem/f_malloc.c:155
155 (*f)->prev = &(frag->u.nxt_free);
(gdb) bt full
#0 0x00000000004a2bcf in fm_insert_free (qm=0x2aca30c14000, p=<value optimized out>) at mem/f_malloc.c:155
f = 0x2aca30c14070
hash = 3
#1 fm_free (qm=0x2aca30c14000, p=<value optimized out>) at mem/f_malloc.c:460
f = 0x2aca315c1718
n = <value optimized out>
__FUNCTION__ = "fm_free"
#2 0x00002aca301bdcde in free_dlg_dlg () at dlg_hash.c:168
i = 0
#3 destroy_dlg_table () at dlg_hash.c:234
dlg = 0x2aca31692c70
i = 0
#4 0x00002aca301aa651 in mod_destroy () at dialog.c:742
No locals.
#5 0x0000000000476ff4 in destroy_modules () at sr_module.c:370
t = 0x785328
foo = 0x785258
#6 0x00000000004252e0 in cleanup (show_status=1) at main.c:336
No locals.
#7 0x00000000004261fb in handle_sigs () at main.c:533
chld = 0
chld_status = 139
i = <value optimized out>
do_exit = 1
__FUNCTION__ = "handle_sigs"
#8 0x000000000042a5a9 in main_loop (argc=3, argv=0x4) at main.c:913
i = 4
pid = <value optimized out>
si = 0x0
startup_done = 0x0
chd_rank = 16
__FUNCTION__ = "main_loop"
#9 main (argc=3, argv=0x4) at main.c:1388
cfg_log_stderr = <value optimized out>
cfg_stream = 0x14cfb010
c = <value optimized out>
r = <value optimized out>
tmp = 0x4fc447 "H\215\005\262\333#"
----------------------------------------------------------------------
Comment By: Robert Smith (denodaeus)
Date: 2011-03-04 10:05
Message:
(gdb) bt
#0 fm_remove_free (qm=0x2ade7185d000, p=0x2ade71fd2450, size=<value
optimized out>) at mem/f_malloc.c:175
#1 fm_realloc (qm=0x2ade7185d000, p=0x2ade71fd2450, size=<value optimized
out>) at mem/f_malloc.c:530
#2 0x00002ade70dee4b4 in shm_realloc (dlg=0x2ade722260c8,
tag=0x7fffbd4d4240, rr=0x7fffbd4d40f0, contact=0x7fffbd4d4100,
cseq=0x7fffbd4d4110, sock=0x784710) at ../../mem/shm_mem.h:212
#3 dlg_add_leg_info (dlg=0x2ade722260c8, tag=0x7fffbd4d4240,
rr=0x7fffbd4d40f0, contact=0x7fffbd4d4100, cseq=0x7fffbd4d4110,
sock=0x784710) at dlg_hash.c:309
#4 0x00002ade70de472c in init_leg_info (dlg=0x2ade722260c8, msg=0x7a3968,
t=<value optimized out>, tag=0x7fffbd4d4240) at dlg_handlers.c:242
#5 0x00002ade70de9a6b in push_reply_in_dialog (t=0x2ade7205c1d8,
type=<value optimized out>, param=<value optimized out>) at
dlg_handlers.c:297
#6 dlg_onreply (t=0x2ade7205c1d8, type=<value optimized out>,
param=<value optimized out>) at dlg_handlers.c:342
#7 0x00002ade6f4ed8db in run_trans_callbacks (type=16,
trans=0x2ade7205c1d8, req=0x2ade71ed8ac0, rpl=0x7a3968, code=<value
optimized out>) at t_hooks.c:208
#8 0x00002ade6f508107 in relay_reply (t=0x2ade7205c1d8, p_msg=0x7a3968,
branch=<value optimized out>, msg_status=183, cancel_bitmap=0x7fffbd4d44e8)
at t_reply.c:1147
#9 0x00002ade6f508990 in reply_received (p_msg=0x7a3968) at
t_reply.c:1494
#10 0x0000000000421e98 in forward_reply (msg=0x7a3968) at forward.c:561
#11 0x00000000004570c2 in receive_msg (
buf=0x758f60 "XXXXXXXXXXX"..., len=1208, rcv_info=0x7fffbd4d45d0) at
receive.c:200
#12 0x000000000049be14 in udp_rcv_loop () at udp_server.c:492
#13 0x000000000042a65d in main_loop (argc=3, argv=<value optimized out>)
at main.c:818
#14 main (argc=3, argv=<value optimized out>) at main.c:1388
In this circumstance, it looks like in dlg_add_leg_info, the callID in
dialog is getting overwritten:
print dlg->callid.s
$10 = 0x2ade72226178
"d71c112bbc80838a0410c89217c4968d at 10.2.1.68sip:6785008449 at 10.2.1.68sip:19186128830 at my.sipprovider.comgent:
OpenSIPS SIP (1.6.3-notls (x86_64/linux))\r\n\r\n\r\n\r\nngP\002"
----------------------------------------------------------------------
Comment By: Robert Smith (denodaeus)
Date: 2011-03-03 10:41
Message:
I have the full contents of memdump=2 in a logfile (it's quite large) that
I can send privately if it's needed. I don't know if this is providing a
lot of detail about the crash though, it's pretty ambiguously to the memory
manager.
----------------------------------------------------------------------
Comment By: Robert Smith (denodaeus)
Date: 2011-03-03 10:40
Message:
Putting memory support in, and only memdumping (level 2 as log level is
currently 3 and this is a production machine) is causing a sigabrt to
occur. The pertinent info:
/usr/local/opensips/sbin/opensips[27869]: CRITICAL:core:qm_debug_frag:
qm_*: prev. fragm. tail overwritten(c0c0c0c0,
abcdefe5)[0x2b71c0960470:0x2b71c09604a0]!
<aidanna> seems like the fragment is here:
2011-03-03T15:01:29.496041+00:00 /usr/local/opensips/sbin/opensips[27846]:
17533. N address=0x2b71c09604a0 frag=0x2b71c0960470 size=40 used=1
<aidanna> 2011-03-03T15:01:29.496049+00:00
/usr/local/opensips/sbin/opensips[27846]: alloc'd from proxy.c:
hostent_shm_cpy(83)
----------------------------------------------------------------------
Comment By: Bogdan-Andrei Iancu (bogdan_iancu)
Date: 2011-02-18 15:06
Message:
Hi Robert,
Thanks for the info. It seams to be a generic memory corruption which is a
a bit impossible to trace only from core file.
The best way to trace this is by enabling and running with the memory
debugger:
1. edit Makefile.defs file and remove from the DEFS string the F_MALLOC
define (delete "-DF_MALLOC" line) and add the DBG_QM_MALLOC define (insert
a "-DDBG_QM_MALLOC" line).
2. recompile everything
3. set memlog=6 and memdump=1 in your configuration script (to get the
memory dump).
Start opensips and if memory corruption is detected, opensips will stop
generating core file and a memory dump.
Regards,
Bogdan
----------------------------------------------------------------------
Comment By: Robert Smith (denodaeus)
Date: 2011-02-18 12:09
Message:
Hi bogdan -- new crash file is attached, uac_crash.corefile.txt.
Thanks,
----------------------------------------------------------------------
Comment By: Bogdan-Andrei Iancu (bogdan_iancu)
Date: 2011-02-18 09:53
Message:
Hi Robert,
to get both core file, you can just set the /proc/sys/kernel/core_pattern
to some name which will be unique per process.
Or just set /proc/sys/kernel/core_uses_pid to '1', so you'll get a
core.PID file for every process that crashed.
And post the backtrace of the first corefile.
Regards,
Bogdan
----------------------------------------------------------------------
Comment By: Robert Smith (denodaeus)
Date: 2011-02-16 04:47
Message:
The shutdown was a sig11 which caused the attendant process to exit on
sig15. I'll try to look at the root cause, but I originally thought this
was off of the sip receiver process.
----------------------------------------------------------------------
Comment By: Bogdan-Andrei Iancu (bogdan_iancu)
Date: 2011-02-15 15:59
Message:
Hi Robert,
The traces you posted show a crash in the shutdown procedure - can you
check in the opensips logs to see if the shutdown was a normal one or a
result of another crash? I try to exclude the possibility of a double
coredump (one of the opensips procs crashes a generates a cores -> entire
opensips starts to shutdown a crashes again -> second corefile,
overwritting the first one).
Regards,
Bogdan
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=1086410&aid=3182321&group_id=232389
More information about the Devel
mailing list