[OpenSIPS-Devel] Crash in OpenSIPS 2.1.4 (tm?)

Tue Mar 7 03:20:20 EST 2017

Hi, Nick!

I see you are using pthreads locks, correct? Can you switch the locking 
mechanism to FUTEX and check if it behaves differently?

Best regards,

Răzvan Crainea
OpenSIPS Solutions
www.opensips-solutions.com

On 03/06/2017 06:59 PM, Maxim Sobolev wrote:
> The code leading to a crash is following:
>
>         /* lock reply processing to determine how to proceed reliably */
>         LOCK_REPLIES( t );
>
> We also see a lot of messages like this before the crash:
>
> Mar  6 16:41:17 jenv_1 /usr/local/sbin/opensips[27453]: 
> WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled 
> for 15365200 ms (now 15365390 ms), it may overlap..
...
> Mar  6 16:41:18 jenv_1 /usr/local/sbin/opensips[27453]: 
> WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled 
> for 15366070 ms (now 15366160 ms), it may overlap..
>
> On Mon, Mar 6, 2017 at 8:51 AM, Maxim Sobolev <sobomax at sippysoft.com 
> <mailto:sobomax at sippysoft.com>> wrote:
>
>     Hi folks,
>
>     We have observed the following crash in the OpenSIPS:
>
>     $ sudo gdb712 /usr/local/sbin/opensips ~/opensips.27455.core
>     GNU gdb (GDB) 7.12 [GDB v7.12 for FreeBSD]
>     Core was generated by `opensips'.
>     Program terminated with signal SIGABRT, Aborted.
>     #0  0x0000000800ccf39a in thr_kill () from /lib/libc.so.7
>     [Current thread is 1 (LWP 100588)]
>     (gdb) bt
>     #0  0x0000000800ccf39a in thr_kill () from /lib/libc.so.7
>     #1  0x0000000800ccf386 in raise () from /lib/libc.so.7
>     #2  0x0000000800ccf309 in abort () from /lib/libc.so.7
>     #3  0x00000008009fe95a in ?? () from /lib/libthr.so.3
>     #4  0x00000008009fa046 in ?? () from /lib/libthr.so.3
>     #5  0x0000000801a148e1 in _lock (s=0x805003800) at lock.h:100
>     #6  0x0000000801a14e84 in final_response_handler
>     (fr_tl=0x805002078) at timer.c:389
>     #7  0x0000000801a1664a in timer_routine (ticks=15362, set=0x0) at
>     timer.c:1066
>     #8  0x00000000004544dd in handle_timer_job () at timer.c:567
>     #9  0x0000000000519920 in handle_io (fm=0x80142e670, idx=1,
>     event_type=1) at net/net_udp.c:265
>     #10 0x00000000005187ca in io_wait_loop_kqueue (h=0x8b6300
>     <_worker_io>, t=1, repeat=0) at net/../io_wait_loop.h:281
>     #11 0x0000000000519bed in udp_rcv_loop (si=0x80141ff98) at
>     net/net_udp.c:308
>     #12 0x000000000051a5c0 in udp_start_processes (chd_rank=0x7d56c8
>     <chd_rank>, startup_done=0x0) at net/net_udp.c:448
>     #13 0x00000000004318a5 in main_loop () at main.c:731
>     #14 0x0000000000433c79 in main (argc=9, argv=0x7fffffffe950) at
>     main.c:1271
>
>     The opensips configuration is:
>
>             if (method == "INVITE") {
>                     if (!t_newtran()) {
>                             sl_reply_error();
>                             exit;
>                     };
>             };
>             # Do strict routing if pre-loaded route headers present
>             if (loose_route() && !(method == "INVITE")) {
>                     t_relay();
>                     exit;
>             };
>             if ((!lookup("location") && method == "INVITE" && uri ==
>     myself) || uri == myself) {
>                     sl_send_reply("404", "Not Found");
>                     exit;
>             };
>             if (method == "INVITE") {
>                     record_route();
>             };
>             if (!t_relay()) {
>                     sl_reply_error();
>             };
>
>     SIP exchange leading to this is below. It's basically case of the
>     call that has been cancelled on the side A but INVITE got no
>     provisional reply on side B.
>
>     -Max
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/devel/attachments/20170307/17673a86/attachment.html>