[OpenSIPS-Devel] [ opensips-Bugs-3570495 ] crashes in the dialog module with opensips 1.8.1-tls

SourceForge.net noreply at sourceforge.net
Mon Oct 1 14:45:32 CEST 2012


Bugs item #3570495, was opened at 2012-09-21 05:26
Message generated for change (Comment added) made by vladut-paiu
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=1086410&aid=3570495&group_id=232389

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Dragos Oancea (dragosoancea)
Assigned to: Nobody/Anonymous (nobody)
Summary: crashes in the dialog module with opensips 1.8.1-tls

Initial Comment:
Hi ,


The setup where I am experiencing these crashes has the following particularities:

1. TLS 

My TLS config looks like this:

listen = tls:212.162.x.x:5061
listen = tcp:172.20.x.x:5060


# tls params
disable_tls = no
tls_verify_server = 1
tls_verify_client = 0
tls_require_client_certificate = 0
tls_method = TLSv1
tls_certificate = "/etc/pki/CA/certs/mydomain.crt"
tls_private_key = "/etc/pki/CA/private/mydomain.key"
tls_ca_list = "/etc/pki/CA/certs/ca.crt"

# tcp parameters
tcp_connection_lifetime=1860
open_files_limit=8192
tcp_max_connections=4096


2. SIP ping for NAT bypassing
I create the dialog with 
create_dialog("BPp") , sending OPTIONS to both sides .

I also call fix_nated_contact() on requests and replies.


3. I have calls where a protocol change is necessary , like  UDP->TLS , TLS->UDP , TLS->TCP .

I use: 
modparam("rr", "enable_double_rr", 1)


4. I use the same database for multiple SIP proxies.



Unfortunally opensips was not compiled with debugging symbols.


Here 1st crash ( I saw this one ALOT of times) : 


Core was generated by `/usr/sbin/opensips -P /var/run/opensips.pid -f /etc/opensips/opensips.cfg'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f160948c7e8 in dlg_ping_routine () from /usr/lib64/opensips/modules/dialog.so
Missing separate debuginfos, use: debuginfo-install opensips-1.8.1-3.el6.x86_64
(gdb) bt full 
#0  0x00007f160948c7e8 in dlg_ping_routine () from /usr/lib64/opensips/modules/dialog.so
No symbol table info available.
#1  0x000000000047d4ca in start_timer_processes ()
No symbol table info available.
#2  0x000000000042a814 in main ()
No symbol table info available.


Core was generated by `/usr/sbin/opensips -P /var/run/opensips.pid -f /etc/opensips/opensips.cfg'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f160948c7e8 in dlg_ping_routine () from /usr/lib64/opensips/modules/dialog.so
Missing separate debuginfos, use: debuginfo-install opensips-1.8.1-3.el6.x86_64
(gdb) i r 
rax            0x7f15e430a388   139731999433608
rbx            0x1      1
rcx            0x7f15e43020a0   139731999400096
rdx            0x400    1024
rsi            0x1      1
rdi            0x200000009      8589934601
rbp            0x7f15e4792570   0x7f15e4792570
rsp            0x7fff372daaa0   0x7fff372daaa0
r8             0x7f15e4792558   139732004185432
r9             0x6      6
r10            0x4000   16384
r11            0x0      0
r12            0x7f15e47926e8   139732004185832
r13            0x200000009      8589934601
r14            0x769660 7771744
r15            0x734cc8 7556296
rip            0x7f160948c7e8   0x7f160948c7e8 <dlg_ping_routine+280>
eflags         0x10202  [ IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
(gdb) bt
#0  0x00007f160948c7e8 in dlg_ping_routine () from /usr/lib64/opensips/modules/dialog.so
#1  0x000000000047d4ca in start_timer_processes ()
#2  0x000000000042a814 in main ()
(gdb) bt full
#0  0x00007f160948c7e8 in dlg_ping_routine () from /usr/lib64/opensips/modules/dialog.so
No symbol table info available.
#1  0x000000000047d4ca in start_timer_processes ()
No symbol table info available.
#2  0x000000000042a814 in main ()
No symbol table info available.
(gdb) 



this crash happends from time to time , unrelated to the load apparently.


machine configuration:
===================
Centos 6.3
Linux somedomain.com 2.6.32-220.23.1.el6.x86_64 #1 SMP Mon Jun 18 18:58:52 BST 2012 x86_64 x86_64 x86_64 GNU/Linux  

opensips -V
version: opensips 1.8.1-tls (x86_64/linux)
flags: STATS: Off, USE_IPV6, USE_TCP, USE_TLS, DISABLE_NAGLE, USE_MCAST, SHM_MEM, SHM_MMAP, PKG_MALLOC, F_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
svnrevision: unknown
@(#) $Id: main.c 8772 2012-03-08 11:16:13Z bogdan_iancu $
main.c compiled on 17:04:54 Sep  8 2012 with gcc 4.4.6

openssl version:
openssl-1.0.0-20.el6_2.5.x86_64

======================

-------------------------------------------------------------------
-------------------------------------------------------------------

Here is the 2nd crash - it could be related to the first.  (this is on another box, with a
slightly different config file (it listens to UDP as well)

I did not see this showing like below in gdb on opensips 1.7.2 (it never used to crash in unref_dlg , only in dlg_ping_routing())



Core was generated by `/usr/sbin/opensips -P /var/run/opensips.pid -f /etc/opensips/opensips.cfg'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fe93ab05491 in unref_dlg () from /usr/lib64/opensips/modules/dialog.so
Missing separate debuginfos, use: debuginfo-install opensips-1.8.1-3.el6.x86_64
(gdb) bt full 
#0  0x00007fe93ab05491 in unref_dlg () from /usr/lib64/opensips/modules/dialog.so
No symbol table info available.
#1  0x00007fe93ab0e7f5 in dlg_ping_routine () from /usr/lib64/opensips/modules/dialog.so
No symbol table info available.
#2  0x000000000047d4ca in start_timer_processes ()
No symbol table info available.
#3  0x000000000042a814 in main ()
No symbol table info available.





machine configuration (almost the same config as 2st crash):
===============
Centos 6.3
Linux somedomain.com 2.6.32-279.2.1.el6.x86_64 #1 SMP Fri Jul 20 01:55:29 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

opensips -V
version: opensips 1.8.1-tls (x86_64/linux)
flags: STATS: Off, USE_IPV6, USE_TCP, USE_TLS, DISABLE_NAGLE, USE_MCAST, SHM_MEM, SHM_MMAP, PKG_MALLOC, F_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
svnrevision: unknown
@(#) $Id: main.c 8772 2012-03-08 11:16:13Z bogdan_iancu $
main.c compiled on 17:04:54 Sep  8 2012 with gcc 4.4.6

opessl version:
openssl-1.0.0-20.el6_2.5.x86_64

================


syslog of the crashed process:
http://pastebin.com/F3hu1WY5


please notice something that looks like a memory leak in opensips:

CRITICAL:dialog:unref_dlg: bogus ref -1788757169 with cnt 1 for dlg 0x7fe915eccd30 [0:0] with clid '' and tags 'EDH-RSA-DES-CBC3-SHA' ''


tags "EDH-RSA-DES-CBC3-SHA" ? these are ssl ciphers. how could they be related to the dialog module ?  


One particular thing related to the second bug is that just before it crashed, opensips was showing alot of messages like this,
coming everytime from the same remote unknown client , but with different  source port:

ERROR:core:tls_accept: New TLS connection from 82.113.x.x:43574 failed to accept: rejected by client
ERROR:core:tls_accept: New TLS connection from 82.113.x.x:61586 failed to accept: rejected by client


I have core files, let me know if I can upload them somewhere.

Cheers,
Dragos


----------------------------------------------------------------------

>Comment By: Vladut-Stefan Paiu (vladut-paiu)
Date: 2012-10-01 05:45

Message:
Hello,

Ok,cool. Is this with both patches applied ( dlg patch + TLS patch ) ?
Please keep us up to date if things remain stable in time.

Regards,
Vlad

----------------------------------------------------------------------

Comment By: Dragos Oancea (dragosoancea)
Date: 2012-10-01 05:00

Message:
Hi

Thank you for the feedback.
I applied the patch three days ago. Looks good so far.

Cheers,
Dragos

----------------------------------------------------------------------

Comment By: Vladut-Stefan Paiu (vladut-paiu)
Date: 2012-09-28 00:42

Message:
Hello,

Also, for the TLS memory corruption, can you please try the patch for this
trunk revision :
http://opensips.svn.sourceforge.net/viewvc/opensips?view=revision&revision=9268

It added support for proper locking within the openssl library. It's not
yet back-ported to the 1.8 branch, as we felt it needed a little bit more
testing.

Regards,
Vlad

----------------------------------------------------------------------

Comment By: Vladut-Stefan Paiu (vladut-paiu)
Date: 2012-09-28 00:32

Message:
Hello,

Can you apply the attached patch, and see if you still experience these
issues ?

Thanks and Regards,
Vlad

----------------------------------------------------------------------

Comment By: Dragos Oancea (dragosoancea)
Date: 2012-09-27 07:55

Message:
Hi

It happened again.

Program terminated with signal 11, Segmentation fault.
#0  0x00007fe93ab05491 in unlink_unsafe_dlg (dlg=0x7fe915eccd30, cnt=1) at
dlg_hash.c:706
706                     dlg->next->prev = dlg->prev;

(gdb) bt full 
#0  0x00007fe93ab05491 in unlink_unsafe_dlg (dlg=0x7fe915eccd30, cnt=1) at
dlg_h
ash.c:706
No locals.
#1  unref_dlg (dlg=0x7fe915eccd30, cnt=1) at dlg_hash.c:740
        d_entry = 0x7fe915c3f990
        __FUNCTION__ = "unref_dlg"
#2  0x00007fe93ab0e7f5 in dlg_ping_routine (ticks=<value optimized out>,
attr=<v
alue optimized out>) at dlg_timer.c:529
        expired = <value optimized out>
        it = 0x0
        curr = 0x0
        dlg = 0x7fe915eccd30
        __FUNCTION__ = "dlg_ping_routine"
#3  0x000000000047d4ca in timer_ticker () at timer.c:355
        t = 0x7fe93c19c810
#4  run_timer_process () at timer.c:425
        multiple = 10
       cnt = <value optimized out>
        tv = {tv_sec = 0, tv_usec = 0}
#5  start_timer_processes () at timer.c:522
        tpl = 0x7fe93c20aa60
        pid = <value optimized out>
        __FUNCTION__ = "start_timer_processes"
#6  0x000000000042a814 in main_loop (argc=<value optimized out>,
argv=<value optimized out>) at main.c:933
        i = <value optimized out>
        pid = <value optimized out>
        si = 0x0
        startup_done = 0x0
        chd_rank = 0
        rc = <value optimized out>
        load_p = 0x0
#7  main (argc=<value optimized out>, argv=<value optimized out>) at
main.c:1520
        cfg_log_stderr = <value optimized out>
        cfg_stream = <value optimized out>
        c = <value optimized out>
        r = <value optimized out>
        tmp = 0x1 <Address 0x1 out of bounds>
        tmp_len = <value optimized out>
        port = <value optimized out>
        proto = <value optimized out>
        options = 0x4eb5a8 "f:cCm:M:b:l:n:N:rRvdDETSVhw:t:u:g:P:G:W:o:"
        ret = -1
        seed = 1183817865
        rfd = <value optimized out>
        __FUNCTION__ = "main"


This function is called when Opensips would try to end the dialog sending
BYE to both sides, due to the fact it did not receive a reply to OPTIONS
request that it's sending.


These are the comments of a good samaritan on IRC (freenode / #opensips) ,
in regard to this bug :

.......
> how many process have the run_timer_process ?
> Only one or multiple ?
> the dialog may have been deleted after get_timeout_dlgs() which hold the
lock for lock_get(ping_timer->lock);
> OR
>  /* no longer reffed in list */
>   unref_dlg(dlg,1);
.......
> how many process have the run_timer_process ?
> Only one or multiple ?
> the dialog may have been deleted after get_timeout_dlgs() which hold the
lock for lock_get(ping_timer->lock);
> OR
>  /* no longer reffed in list */
>   unref_dlg(dlg,1);
>  /* dlg is still reffed in TM callback. deletion from memory
>    * will happen only on 408 timeout */
> so that if you ping a leg
> and there is no answer
> and while you called get_timeout_dlgs() you received a 408
> the dialog may be destroyed


> if it happens randomly when a pinged leg doesn't answer then it might be
related to a race condition with the 408 (or any other path that destroy
the dialog)
> (the dialog extracted from the ping list)

..........

PS: Thank you for the comments, these might really help!

Cheers,
Dragos


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=1086410&aid=3570495&group_id=232389



More information about the Devel mailing list