[OpenSIPS-Devel] [ opensips-Bugs-2860109 ] segfault caused by usrloc:db_timer_udomain when db goes away

SourceForge.net noreply at sourceforge.net
Thu Sep 17 21:24:14 CEST 2009


Bugs item #2860109, was opened at 2009-09-16 12:28
Message generated for change (Comment added) made by ptdamore
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=1086410&aid=2860109&group_id=232389

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: modules
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Phil D'Amore (ptdamore)
Assigned to: Bogdan-Andrei Iancu (bogdan_iancu)
Summary: segfault caused by usrloc:db_timer_udomain when db goes away

Initial Comment:
db_timer_udomain only initializes the keys for the delete query if there isn't an existing prepared statement  (my_ps == NULL).  This works as long as the connection doesn't go away for a long period of time.  If the db goes away for 2*timer_interval, the stored statement context is no longer valid, and the underlying db_delete function needs to re-create it from the original query.  However, since the keys aren't populated, you get a segfault when there is an attempt to read them.

The patch attached removes the conditional setup of the keys in db_timer_udomain.

This was with the mysql driver, but it seems like good practice for any driver.

----------------------------------------------------------------------

Comment By: Phil D'Amore (ptdamore)
Date: 2009-09-17 15:24

Message:
As promised:

(gdb) frame
#0  0x00000000004d4578 in db_print_where (_c=0x808900, _b=0x76357b "", 
    _l=65509, _k=0x7fff06e92610, _o=0x7fff06e92600, _v=0x7fff06e925c0,
_n=2, 
    val2str=0x2b5aa3c2c270 <db_mysql_val2str>) at db/db_ut.c:284
284				ret = snprintf(_b + len, _l - len, "%.*s%s", 
(gdb) bt
#0  0x00000000004d4578 in db_print_where (_c=0x808900, _b=0x76357b "", 
    _l=65509, _k=0x7fff06e92610, _o=0x7fff06e92600, _v=0x7fff06e925c0,
_n=2, 
    val2str=0x2b5aa3c2c270 <db_mysql_val2str>) at db/db_ut.c:284
#1  0x00000000004d2135 in db_do_delete (_h=0x808900, _k=0x7fff06e92610, 
    _o=0x7fff06e92600, _v=0x7fff06e925c0, _n=2, 
    val2str=0x2b5aa3c2c270 <db_mysql_val2str>, 
    submit_query=0x2b5aa3c24830 <db_mysql_submit_dummy_query>)
    at db/db_query.c:213
#2  0x00002b5aa3c2a119 in db_mysql_delete (_h=0x808900, _k=0x7fff06e92610,

    _o=0x7fff06e92600, _v=0x7fff06e925c0, _n=2) at dbase.c:883
#3  0x00002b5aa46c300f in db_timer_udomain (_d=0x2b5aa82a77d8) at
udomain.c:620
#4  0x00002b5aa46b9ac8 in synchronize_all_udomains () at dlist.c:557
#5  0x00002b5aa46c9f0e in timer (ticks=8423680, param=0x76357b) at
ul_mod.c:384
#6  0x0000000000486f13 in start_timer_processes () at timer.c:266
#7  0x000000000042987a in main (argc=8, argv=0x7fff06e92868) at main.c:819

The full code for the relevant line is:

284				ret = snprintf(_b + len, _l - len, "%.*s%s", 
285					_k[i]->len, _k[i]->s, _o[i]);

And:

(gdb) print i
$3 = 0
(gdb) print _k[i]
$4 = (const db_key_t) 0x0

It's also worth noting that _o is really not initialized either, but since
it's not being dereferenced, it wouldn't be involved here.

Also for your viewing pleasure, here is the relevant stuff generated from
this in the syslog, starting from the point where I took my db server down.
 My usrloc sync timer interval is 30 seconds...

Sep 17 15:13:18 sipprx-01 /usr/sbin/opensips[20149]:
INFO:db_mysql:db_mysql_do_p
repared_query: disconect event for 0x808a00 
Sep 17 15:13:18 sipprx-01 /usr/sbin/opensips[20149]:
INFO:db_mysql:reset_all_sta
tements: reseting all statements on connection: (0x808900) 0x808a00 
Sep 17 15:13:18 sipprx-01 /usr/sbin/opensips[20149]:
ERROR:db_mysql:db_mysql_con
nect: driver error(2003): Can't connect to MySQL server on '10.2.0.68'
(111) 
Sep 17 15:13:18 sipprx-01 last message repeated 2 times
Sep 17 15:13:18 sipprx-01 /usr/sbin/opensips[20149]:
ERROR:usrloc:db_timer_udoma
in: failed to delete from table location 
Sep 17 15:13:18 sipprx-01 /usr/sbin/opensips[20149]: ERROR:usrloc:timer:
synchro
nizing cache failed 
Sep 17 15:13:48 sipprx-01 /usr/sbin/opensips[20149]:
INFO:db_mysql:re_init_state
ment:  query  is <delete from location where expires<? AND expires!=?>,
ptr=(nil
) 
Sep 17 15:13:48 sipprx-01 /usr/sbin/opensips[20149]:
INFO:db_mysql:re_init_state
ment: disconect event for 0x808a00 
Sep 17 15:13:48 sipprx-01 /usr/sbin/opensips[20149]:
ERROR:db_mysql:db_mysql_con
nect: driver error(2003): Can't connect to MySQL server on '10.2.0.68'
(111) 
Sep 17 15:13:48 sipprx-01 last message repeated 2 times
Sep 17 15:13:48 sipprx-01 /usr/sbin/opensips[20149]:
ERROR:db_mysql:re_init_stat
ement: failed while mysql_stmt_prepare() 
Sep 17 15:13:48 sipprx-01 /usr/sbin/opensips[20149]:
ERROR:db_mysql:db_mysql_do_
prepared_query: failed to re-init statement! 
Sep 17 15:13:48 sipprx-01 /usr/sbin/opensips[20149]:
ERROR:usrloc:db_timer_udoma
in: failed to delete from table location 
Sep 17 15:13:48 sipprx-01 /usr/sbin/opensips[20149]: ERROR:usrloc:timer:
synchro
nizing cache failed 
Sep 17 15:14:18 sipprx-01 kernel: opensips[20149]: segfault at
0000000000000008 
rip 00000000004d4578 rsp 00007fff06e924a0 error 4
Sep 17 15:14:18 sipprx-01 media-dispatcher[6993]:
[OpenSIPSControlProtocol,1194,
] Connection to OpenSIPS lost: Connection was closed cleanly.
Sep 17 15:14:18 sipprx-01 /usr/sbin/opensips[20123]:
INFO:core:handle_sigs: chil
d process 20149 exited by a signal 11 
Sep 17 15:14:18 sipprx-01 /usr/sbin/opensips[20123]:
INFO:core:handle_sigs: core
 was generated 
Sep 17 15:14:18 sipprx-01 /usr/sbin/opensips[20123]:
INFO:core:handle_sigs: term
inating due to SIGCHLD 

Thanks,
Phil

----------------------------------------------------------------------

Comment By: Phil D'Amore (ptdamore)
Date: 2009-09-17 15:01

Message:
Hi Bogdan...

I was getting this error in 1.5.3.  Sorry, I should have been explicit
with this before, but I was in a bit too much of a hurry.  1.5.3 has
dbase.c from r5886 which seems to include all changes on the 1.5 branch to
date.

I see some of the changes you might have been referring to and I believe
this is a different problem.  I see that the prep_stmt struct from db_mysql
does in fact cache the statement, which also puzzled me as I looked at this
problem.  Now that I've thought about it, my fix is probably not optimal,
and db_mysql could be doing more to use that cached statement.  The problem
I saw came up in the db_mysql_delete function, although all the db_mysql_OP
functions seem to be structured the same way.

Here's how I'm seeing it:

When the connection goes away, the re_init_statement function removes the
offending statement from the prep_stmt->stmts linked list.  Since, at least
in the case of the usrloc timer task, there is only one statement, the
result is that prep_stmt->stmts is now NULL.  As a result, the next call to
db_mysql_delete calls has_stmt_ctx, which now returns a false value.  This
triggers a call to db_do_delete, which expects the keys to be populated.  I
believe that since the keys are not populated (after all, as far as the
timer function is concerned my_ps is not NULL, so it didn't bother to
populate the keys), db_do_delete winds up exploding.

All of my test servers are now running my patch, but I'll back one out and
reproduce this and get a stack trace you can look at.

Thanks,
Phil

----------------------------------------------------------------------

Comment By: Bogdan-Andrei Iancu (bogdan_iancu)
Date: 2009-09-16 13:52

Message:
That is rather strange as the prepare statements (in mysql module) do catch
the original string query that created the statement and re-use it each
time the statement needs to be re-init. So, the keys are not actually
needed any more (after the original init).

Indeed, there was a problem , but not related to keys, but to the table -
actually this bug was fixed on SVN 1.5 branch last week, so if you update
it should work.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=1086410&aid=2860109&group_id=232389



More information about the Devel mailing list