[OpenSIPS-Users] opensips tm timer core dump

Bogdan-Andrei Iancu bogdan at voice-system.ro
Mon Oct 25 14:27:48 CEST 2010


Hi Kennard,

You are perfectly right - there was a change  in moving the setT(cell) 
upper in the function (we needed it in the callbacks), and this bug 
looks like a side effect.

I will make the fix asap !

Thanks and regards,
Bogdan

Kennard_White at logitech.com wrote:
>
> Hi Bogdan,
>
> I believe I found the problem. When sip_msg_cloner() within 
> build_cell() fails due to out-of-mem, and dangling pointer to the cell 
> is left in the global transaction pointer. Later on the post_cb() code 
> attempts to clean this up, and "resurrects" the now-free memory, and 
> in particular puts it on a wait timer. My guess is that later on this 
> memory is allocated into a new transaction which eventually uses the 
> same wait timer link, and the two threads fight it out.
>
> Adding a set_t(0) fixes the problem. While looking, I believe I found 
> a memory leak in uac creation code that is also triggered by out of 
> memory conditions. Please see attached patch.
>
> The "problem" also manifest itself as a core dump with 
> insert_timer_unsafe. Once I disabled optimization, I saw multiple 
> variations within this function. The most common was that ptr==tl 
> after the search. The sequence of events for this is: new cell 
> created, stored into global T, free'd, added to wait-list by cleanup 
> code from global T, memory re-allocated into new cell, timer link 
> zero'd, and then added again to wait list. At least that is my best guess.
>
> Regards,
> Kennard
>
> /(See attached file: opensips-tm-cell.patch)/
>
> Inactive hide details for Bogdan-Andrei Iancu ---10/22/2010 07:02:36 
> AM---Hi Kennard, I suppose the bt is the same ? do you 
> stiBogdan-Andrei Iancu ---10/22/2010 07:02:36 AM---Hi Kennard, I 
> suppose the bt is the same ? do you still have the core file ?
>
> From: Bogdan-Andrei Iancu <bogdan at voice-system.ro>
> To: OpenSIPS users mailling list <users at lists.opensips.org>
> Date: 10/22/2010 07:02 AM
> Subject: Re: [OpenSIPS-Users] opensips tm timer core dump
> Sent by: users-bounces at lists.opensips.org
>
> ------------------------------------------------------------------------
>
>
>
> Hi Kennard,
>
> I suppose the bt is the same ? do you still have the core file ?
>
> Regards,
> Bogdan
>
> Kennard_White at logitech.com wrote:
> >
> > Hi Bodgen,
> >
> > I replicated the error. Unfortunately the entire insert_timer_unsafe
> > and been in-lined and little is available:
> >
> > Program terminated with signal 11, Segmentation fault.
> > #0 0x00007f8b8356c2c2 in insert_timer_unsafe (new_tl=0x7f8b7a54e310,
> > list_id=WT_TIMER_LIST, ext_timeout=<value optimized out>) at timer.c:731
> > 731 timer.c: No such file or directory.
> > in timer.c
> > (gdb) print tl
> > $1 = <value optimized out>
> > (gdb) print *tl
> > Cannot access memory at address 0x0
> > (gdb) print ptr
> > $2 = <value optimized out>
> > (gdb) print *ptr
> > Cannot access memory at address 0x0
> > (gdb) print *new_tl
> > No symbol "new_tl" in current context.
> > (gdb) up
> > #1 set_1timer (new_tl=0x7f8b7a54e310, list_id=WT_TIMER_LIST,
> > ext_timeout=<value optimized out>) at timer.c:904
> > 904 in timer.c
> > (gdb) print *new_tl
> > $3 = {next_tl = 0x0, prev_tl = 0x0, ld_tl = 0x0, time_out = 0,
> > timer_list = 0x0, deleted = 0}
> > (gdb) print list
> > $4 = <value optimized out>
> > (gdb) print timeout
> > $5 = 32
> > (gdb) print new_tl
> > $6 = (struct timer_link *) 0x7f8b7a54e310
> >
> > I'll keep the core for a while -- please let me know if there is
> > anything else I can try.
> >
> > Thanks,
> > Kennard
> >
> > Inactive hide details for Bogdan-Andrei Iancu ---10/08/2010 04:40:47
> > AM---Hi Kennard, Ok, keep the core next time :)Bogdan-Andrei Iancu
> > ---10/08/2010 04:40:47 AM---Hi Kennard, Ok, keep the core next time :)
> >
> > From: Bogdan-Andrei Iancu <bogdan at voice-system.ro>
> > To: OpenSIPS users mailling list <users at lists.opensips.org>
> > Date: 10/08/2010 04:40 AM
> > Subject: Re: [OpenSIPS-Users] opensips tm timer core dump
> > Sent by: users-bounces at lists.opensips.org
> >
> > ------------------------------------------------------------------------
> >
> >
> >
> > Hi Kennard,
> >
> > Ok, keep the core next time :)
> >
> > Regards,
> > Bogdan
> >
> > Kennard_White at logitech.com wrote:
> > >
> > > Hi Bogden,
> > >
> > > Thanks for explaining the child processes involved -- I misunderstood
> > > what was happening.
> > >
> > > Unfortunately, I don't have the core anymore. My recollection is that
> > > I couldn't print anything useful due to compiler optimization. That
> > > said, this should re-create pretty easily, and I'll get more dumps
> > > next time it happens.
> > >
> > > Regards,
> > > Kennard
> > >
> > > Inactive hide details for Bogdan-Andrei Iancu ---10/05/2010 01:41:38
> > > AM---Hi Kennard, The core was generated by process 22255:Bogdan-Andrei
> > > Iancu ---10/05/2010 01:41:38 AM---Hi Kennard, The core was generated
> > > by process 22255:
> > >
> > > From: Bogdan-Andrei Iancu <bogdan at voice-system.ro>
> > > To: OpenSIPS users mailling list <users at lists.opensips.org>
> > > Date: 10/05/2010 01:41 AM
> > > Subject: Re: [OpenSIPS-Users] opensips tm timer core dump
> > > Sent by: users-bounces at lists.opensips.org
> > >
> > > 
> ------------------------------------------------------------------------
> > >
> > >
> > >
> > > Hi Kennard,
> > >
> > > The core was generated by process 22255:
> > >    [22238]: INFO:core:handle_sigs: child process 22255 exited by a
> > > signal 11
> > >
> > > and this process also reported mem problems:
> > >    [22255]: ERROR:tm:new_t: out of mem
> > >
> > > Can you print the "tl" or "ptr" variables in frame 0?
> > >
> > > Regards,
> > > Bogdan
> > >
> > > Kennard_White at logitech.com wrote:
> > > >
> > > > Running against opensips HEAD, I got a segfault in the tm timer 
> code.
> > > > I believe this is triggered by running out of shared memory.
> > > >
> > > >
> > > > The stack trace:
> > > >
> > > > (gdb) where
> > > > #0 0x00007fe8f8d96212 in insert_timer_unsafe (new_tl=0x7fe8f66337b0,
> > > > list_id=WT_TIMER_LIST, ext_timeout=<value optimized out>) at
> > timer.c:731
> > > > #1 set_1timer (new_tl=0x7fe8f66337b0, list_id=WT_TIMER_LIST,
> > > > ext_timeout=<value optimized out>) at timer.c:904
> > > > #2 0x00007fe8f8d78ac8 in t_release_transaction 
> (trans=0x7fe8f6633730)
> > > > at t_funcs.c:122
> > > > #3 0x00007fe8f8d808e5 in t_unref (p_msg=<value optimized out>)
> > > > at t_lookup.c:1152
> > > > #4 0x0000000000483ae5 in exec_post_req_cb ()
> > > > #5 0x000000000046c1e4 in receive_msg ()
> > > > #6 0x00000000004bc77c in udp_rcv_loop ()
> > > > #7 0x000000000042de9c in main ()
> > > >
> > > > The offending code (I believe):
> > > > if (tl->time_out==ptr->time_out) {
> > > > tl->ld_tl = ptr->ld_tl
> > > > ptr->ld_tl = 0;
> > > > tl->ld_tl->ld_tl = tl; <-- SEG FAULT HERE (according to trace)
> > > > } else {
> > > > tl->ld_tl = tl;
> > > > }
> > > >
> > > > Unfortunately, due to optimization I cannot dump anything 
> useful, and
> > > > I'm not convinced the actual fault is on the line indicated. 
> Note that
> > > > the core dump is not one of the processes that reported out of 
> memory.
> > > > Maybe one of the other processes left the timer list in a corrupt
> > state?
> > > >
> > > > The log file:
> > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22255]:
> > > > ERROR:tm:sip_msg_cloner: no more share memory
> > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22255]:
> > > > ERROR:tm:new_t: out of mem
> > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22255]:
> > > > ERROR:tm:t_newtran: new_t failed
> > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]:
> > > > WARNING:core:fm_malloc: Not enough free memory, will atempt
> > > defragmenation
> > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]:
> > > > ERROR:tm:sip_msg_cloner: no more share memory
> > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]:
> > > > ERROR:tm:new_t: out of mem
> > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]:
> > > > ERROR:tm:t_newtran: new_t failed
> > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22238]:
> > > > INFO:core:handle_sigs: child process 22255 exited by a signal 11
> > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22238]:
> > > > INFO:core:handle_sigs: core was generated
> > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22238]:
> > > > INFO:core:handle_sigs: terminating due to SIGCHLD
> > > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22256]:
> > > > INFO:core:sig_usr: signal 15 received
> > > >
> > > >
> > ------------------------------------------------------------------------
> > > >
> > > > _______________________________________________
> > > > Users mailing list
> > > > Users at lists.opensips.org
> > > > http://lists.opensips.org/cgi-bin/mailman/listinfo/users
> > > >  
> > >
> > >
> > > --
> > > Bogdan-Andrei Iancu
> > > OpenSIPS Bootcamp
> > > 15 - 19 November 2010, Edison, New Jersey, USA
> > > www.voice-system.ro
> > >
> > >
> > > _______________________________________________
> > > Users mailing list
> > > Users at lists.opensips.org
> > > http://lists.opensips.org/cgi-bin/mailman/listinfo/users
> > >
> > > 
> ------------------------------------------------------------------------
> > >
> > > _______________________________________________
> > > Users mailing list
> > > Users at lists.opensips.org
> > > http://lists.opensips.org/cgi-bin/mailman/listinfo/users
> > >  
> >
> >
> > --
> > Bogdan-Andrei Iancu
> > OpenSIPS Bootcamp
> > 15 - 19 November 2010, Edison, New Jersey, USA
> > www.voice-system.ro
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users at lists.opensips.org
> > http://lists.opensips.org/cgi-bin/mailman/listinfo/users
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Users mailing list
> > Users at lists.opensips.org
> > http://lists.opensips.org/cgi-bin/mailman/listinfo/users
> >  
>
>
> -- 
> Bogdan-Andrei Iancu
> OpenSIPS Bootcamp
> 15 - 19 November 2010, Edison, New Jersey, USA
> www.voice-system.ro
>
>
> _______________________________________________
> Users mailing list
> Users at lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Users mailing list
> Users at lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>   


-- 
Bogdan-Andrei Iancu
OpenSIPS Bootcamp
15 - 19 November 2010, Edison, New Jersey, USA
www.voice-system.ro




More information about the Users mailing list