<html><body>
<p>Hi Bogdan,<br>
<br>
I believe I found the problem. When sip_msg_cloner() within build_cell() fails due to out-of-mem, and dangling pointer to the cell is left in the global transaction pointer. Later on the post_cb() code attempts to clean this up, and "resurrects" the now-free memory, and in particular puts it on a wait timer. My guess is that later on this memory is allocated into a new transaction which eventually uses the same wait timer link, and the two threads fight it out.<br>
<br>
Adding a set_t(0) fixes the problem. While looking, I believe I found a memory leak in uac creation code that is also triggered by out of memory conditions. Please see attached patch.<br>
<br>
The "problem" also manifest itself as a core dump with insert_timer_unsafe. Once I disabled optimization, I saw multiple variations within this function. The most common was that ptr==tl after the search. The sequence of events for this is: new cell created, stored into global T, free'd, added to wait-list by cleanup code from global T, memory re-allocated into new cell, timer link zero'd, and then added again to wait list. At least that is my best guess.<br>
<br>
Regards,<br>
Kennard<br>
<br>
<i>(See attached file: opensips-tm-cell.patch)</i><br>
<br>
<img width="16" height="16" src="cid:2__=07BBFD57DFEB81D98f9e8a93df9@logitech.com" border="0" alt="Inactive hide details for Bogdan-Andrei Iancu ---10/22/2010 07:02:36 AM---Hi Kennard, I suppose the bt is the same ? do you sti"><font color="#424282">Bogdan-Andrei Iancu ---10/22/2010 07:02:36 AM---Hi Kennard, I suppose the bt is the same ? do you still have the core file ?</font><br>
<br>
<font size="2" color="#5F5F5F">From:        </font><font size="2">Bogdan-Andrei Iancu <bogdan@voice-system.ro></font><br>
<font size="2" color="#5F5F5F">To:        </font><font size="2">OpenSIPS users mailling list <users@lists.opensips.org></font><br>
<font size="2" color="#5F5F5F">Date:        </font><font size="2">10/22/2010 07:02 AM</font><br>
<font size="2" color="#5F5F5F">Subject:        </font><font size="2">Re: [OpenSIPS-Users] opensips tm timer core dump</font><br>
<font size="2" color="#5F5F5F">Sent by:        </font><font size="2">users-bounces@lists.opensips.org</font><br>
<hr width="100%" size="2" align="left" noshade style="color:#8091A5; "><br>
<br>
<br>
<tt>Hi Kennard,<br>
<br>
I suppose the bt is the same ? do you still have the core file ?<br>
<br>
Regards,<br>
Bogdan<br>
<br>
Kennard_White@logitech.com wrote:<br>
><br>
> Hi Bodgen,<br>
><br>
> I replicated the error. Unfortunately the entire insert_timer_unsafe <br>
> and been in-lined and little is available:<br>
><br>
> Program terminated with signal 11, Segmentation fault.<br>
> #0 0x00007f8b8356c2c2 in insert_timer_unsafe (new_tl=0x7f8b7a54e310,<br>
> list_id=WT_TIMER_LIST, ext_timeout=<value optimized out>) at timer.c:731<br>
> 731 timer.c: No such file or directory.<br>
> in timer.c<br>
> (gdb) print tl<br>
> $1 = <value optimized out><br>
> (gdb) print *tl<br>
> Cannot access memory at address 0x0<br>
> (gdb) print ptr<br>
> $2 = <value optimized out><br>
> (gdb) print *ptr<br>
> Cannot access memory at address 0x0<br>
> (gdb) print *new_tl<br>
> No symbol "new_tl" in current context.<br>
> (gdb) up<br>
> #1 set_1timer (new_tl=0x7f8b7a54e310, list_id=WT_TIMER_LIST,<br>
> ext_timeout=<value optimized out>) at timer.c:904<br>
> 904 in timer.c<br>
> (gdb) print *new_tl<br>
> $3 = {next_tl = 0x0, prev_tl = 0x0, ld_tl = 0x0, time_out = 0,<br>
> timer_list = 0x0, deleted = 0}<br>
> (gdb) print list<br>
> $4 = <value optimized out><br>
> (gdb) print timeout<br>
> $5 = 32<br>
> (gdb) print new_tl<br>
> $6 = (struct timer_link *) 0x7f8b7a54e310<br>
><br>
> I'll keep the core for a while -- please let me know if there is <br>
> anything else I can try.<br>
><br>
> Thanks,<br>
> Kennard<br>
><br>
> Inactive hide details for Bogdan-Andrei Iancu ---10/08/2010 04:40:47 <br>
> AM---Hi Kennard, Ok, keep the core next time :)Bogdan-Andrei Iancu <br>
> ---10/08/2010 04:40:47 AM---Hi Kennard, Ok, keep the core next time :)<br>
><br>
> From: Bogdan-Andrei Iancu <bogdan@voice-system.ro><br>
> To: OpenSIPS users mailling list <users@lists.opensips.org><br>
> Date: 10/08/2010 04:40 AM<br>
> Subject: Re: [OpenSIPS-Users] opensips tm timer core dump<br>
> Sent by: users-bounces@lists.opensips.org<br>
><br>
> ------------------------------------------------------------------------<br>
><br>
><br>
><br>
> Hi Kennard,<br>
><br>
> Ok, keep the core next time :)<br>
><br>
> Regards,<br>
> Bogdan<br>
><br>
> Kennard_White@logitech.com wrote:<br>
> ><br>
> > Hi Bogden,<br>
> ><br>
> > Thanks for explaining the child processes involved -- I misunderstood<br>
> > what was happening.<br>
> ><br>
> > Unfortunately, I don't have the core anymore. My recollection is that<br>
> > I couldn't print anything useful due to compiler optimization. That<br>
> > said, this should re-create pretty easily, and I'll get more dumps<br>
> > next time it happens.<br>
> ><br>
> > Regards,<br>
> > Kennard<br>
> ><br>
> > Inactive hide details for Bogdan-Andrei Iancu ---10/05/2010 01:41:38<br>
> > AM---Hi Kennard, The core was generated by process 22255:Bogdan-Andrei<br>
> > Iancu ---10/05/2010 01:41:38 AM---Hi Kennard, The core was generated<br>
> > by process 22255:<br>
> ><br>
> > From: Bogdan-Andrei Iancu <bogdan@voice-system.ro><br>
> > To: OpenSIPS users mailling list <users@lists.opensips.org><br>
> > Date: 10/05/2010 01:41 AM<br>
> > Subject: Re: [OpenSIPS-Users] opensips tm timer core dump<br>
> > Sent by: users-bounces@lists.opensips.org<br>
> ><br>
> > ------------------------------------------------------------------------<br>
> ><br>
> ><br>
> ><br>
> > Hi Kennard,<br>
> ><br>
> > The core was generated by process 22255:<br>
> > [22238]: INFO:core:handle_sigs: child process 22255 exited by a<br>
> > signal 11<br>
> ><br>
> > and this process also reported mem problems:<br>
> > [22255]: ERROR:tm:new_t: out of mem<br>
> ><br>
> > Can you print the "tl" or "ptr" variables in frame 0?<br>
> ><br>
> > Regards,<br>
> > Bogdan<br>
> ><br>
> > Kennard_White@logitech.com wrote:<br>
> > ><br>
> > > Running against opensips HEAD, I got a segfault in the tm timer code.<br>
> > > I believe this is triggered by running out of shared memory.<br>
> > ><br>
> > ><br>
> > > The stack trace:<br>
> > ><br>
> > > (gdb) where<br>
> > > #0 0x00007fe8f8d96212 in insert_timer_unsafe (new_tl=0x7fe8f66337b0,<br>
> > > list_id=WT_TIMER_LIST, ext_timeout=<value optimized out>) at <br>
> timer.c:731<br>
> > > #1 set_1timer (new_tl=0x7fe8f66337b0, list_id=WT_TIMER_LIST,<br>
> > > ext_timeout=<value optimized out>) at timer.c:904<br>
> > > #2 0x00007fe8f8d78ac8 in t_release_transaction (trans=0x7fe8f6633730)<br>
> > > at t_funcs.c:122<br>
> > > #3 0x00007fe8f8d808e5 in t_unref (p_msg=<value optimized out>)<br>
> > > at t_lookup.c:1152<br>
> > > #4 0x0000000000483ae5 in exec_post_req_cb ()<br>
> > > #5 0x000000000046c1e4 in receive_msg ()<br>
> > > #6 0x00000000004bc77c in udp_rcv_loop ()<br>
> > > #7 0x000000000042de9c in main ()<br>
> > ><br>
> > > The offending code (I believe):<br>
> > > if (tl->time_out==ptr->time_out) {<br>
> > > tl->ld_tl = ptr->ld_tl<br>
> > > ptr->ld_tl = 0;<br>
> > > tl->ld_tl->ld_tl = tl; <-- SEG FAULT HERE (according to trace)<br>
> > > } else {<br>
> > > tl->ld_tl = tl;<br>
> > > }<br>
> > ><br>
> > > Unfortunately, due to optimization I cannot dump anything useful, and<br>
> > > I'm not convinced the actual fault is on the line indicated. Note that<br>
> > > the core dump is not one of the processes that reported out of memory.<br>
> > > Maybe one of the other processes left the timer list in a corrupt <br>
> state?<br>
> > ><br>
> > > The log file:<br>
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22255]:<br>
> > > ERROR:tm:sip_msg_cloner: no more share memory<br>
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22255]:<br>
> > > ERROR:tm:new_t: out of mem<br>
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22255]:<br>
> > > ERROR:tm:t_newtran: new_t failed<br>
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]:<br>
> > > WARNING:core:fm_malloc: Not enough free memory, will atempt<br>
> > defragmenation<br>
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]:<br>
> > > ERROR:tm:sip_msg_cloner: no more share memory<br>
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]:<br>
> > > ERROR:tm:new_t: out of mem<br>
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]:<br>
> > > ERROR:tm:t_newtran: new_t failed<br>
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22238]:<br>
> > > INFO:core:handle_sigs: child process 22255 exited by a signal 11<br>
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22238]:<br>
> > > INFO:core:handle_sigs: core was generated<br>
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22238]:<br>
> > > INFO:core:handle_sigs: terminating due to SIGCHLD<br>
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22256]:<br>
> > > INFO:core:sig_usr: signal 15 received<br>
> > ><br>
> > > <br>
> ------------------------------------------------------------------------<br>
> > ><br>
> > > _______________________________________________<br>
> > > Users mailing list<br>
> > > Users@lists.opensips.org<br>
> > > </tt><tt><a href="http://lists.opensips.org/cgi-bin/mailman/listinfo/users">http://lists.opensips.org/cgi-bin/mailman/listinfo/users</a></tt><tt><br>
> > > <br>
> ><br>
> ><br>
> > --<br>
> > Bogdan-Andrei Iancu<br>
> > OpenSIPS Bootcamp<br>
> > 15 - 19 November 2010, Edison, New Jersey, USA<br>
> > </tt><tt>www.voice-system.ro</tt><tt><br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > Users mailing list<br>
> > Users@lists.opensips.org<br>
> > </tt><tt><a href="http://lists.opensips.org/cgi-bin/mailman/listinfo/users">http://lists.opensips.org/cgi-bin/mailman/listinfo/users</a></tt><tt><br>
> ><br>
> > ------------------------------------------------------------------------<br>
> ><br>
> > _______________________________________________<br>
> > Users mailing list<br>
> > Users@lists.opensips.org<br>
> > </tt><tt><a href="http://lists.opensips.org/cgi-bin/mailman/listinfo/users">http://lists.opensips.org/cgi-bin/mailman/listinfo/users</a></tt><tt><br>
> > <br>
><br>
><br>
> -- <br>
> Bogdan-Andrei Iancu<br>
> OpenSIPS Bootcamp<br>
> 15 - 19 November 2010, Edison, New Jersey, USA<br>
> </tt><tt>www.voice-system.ro</tt><tt><br>
><br>
><br>
> _______________________________________________<br>
> Users mailing list<br>
> Users@lists.opensips.org<br>
> </tt><tt><a href="http://lists.opensips.org/cgi-bin/mailman/listinfo/users">http://lists.opensips.org/cgi-bin/mailman/listinfo/users</a></tt><tt><br>
><br>
> ------------------------------------------------------------------------<br>
><br>
> _______________________________________________<br>
> Users mailing list<br>
> Users@lists.opensips.org<br>
> </tt><tt><a href="http://lists.opensips.org/cgi-bin/mailman/listinfo/users">http://lists.opensips.org/cgi-bin/mailman/listinfo/users</a></tt><tt><br>
> <br>
<br>
<br>
-- <br>
Bogdan-Andrei Iancu<br>
OpenSIPS Bootcamp<br>
15 - 19 November 2010, Edison, New Jersey, USA<br>
</tt><tt>www.voice-system.ro</tt><tt><br>
<br>
<br>
_______________________________________________<br>
Users mailing list<br>
Users@lists.opensips.org<br>
</tt><tt><a href="http://lists.opensips.org/cgi-bin/mailman/listinfo/users">http://lists.opensips.org/cgi-bin/mailman/listinfo/users</a></tt><tt><br>
</tt><br>
</body></html>