[OpenSIPS-Devel] [ opensips-Bugs-2860109 ] segfault caused by usrloc:db_timer_udomain when db goes away
SourceForge.net
noreply at sourceforge.net
Tue Sep 22 17:30:49 CEST 2009
Bugs item #2860109, was opened at 2009-09-16 19:28
Message generated for change (Settings changed) made by bogdan_iancu
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=1086410&aid=2860109&group_id=232389
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: modules
Group: None
Status: Open
>Resolution: Fixed
>Priority: 7
Private: No
Submitted By: Phil D'Amore (ptdamore)
Assigned to: Bogdan-Andrei Iancu (bogdan_iancu)
Summary: segfault caused by usrloc:db_timer_udomain when db goes away
Initial Comment:
db_timer_udomain only initializes the keys for the delete query if there isn't an existing prepared statement (my_ps == NULL). This works as long as the connection doesn't go away for a long period of time. If the db goes away for 2*timer_interval, the stored statement context is no longer valid, and the underlying db_delete function needs to re-create it from the original query. However, since the keys aren't populated, you get a segfault when there is an attempt to read them.
The patch attached removes the conditional setup of the keys in db_timer_udomain.
This was with the mysql driver, but it seems like good practice for any driver.
----------------------------------------------------------------------
>Comment By: Bogdan-Andrei Iancu (bogdan_iancu)
Date: 2009-09-22 18:30
Message:
Thanks for the debugging - I see what is happening, and indeed there is a
bug. In re_init_statement(), if the prepared failed, the context must not
be destroyed - maybe next time you can connect to the server. The context
is to be destroyed only if the statement cannot be initialized.
So, the case when the looping is terminated (all 2 cycles) and no success,
the error must not be triggered, but simply failure - there is a return
missing after the for{}.
Please update from SVN and let me know if work ok (no crash and reconnect
also) - see revision 6177.
Regards,
bogdan
----------------------------------------------------------------------
Comment By: Phil D'Amore (ptdamore)
Date: 2009-09-17 22:24
Message:
As promised:
(gdb) frame
#0 0x00000000004d4578 in db_print_where (_c=0x808900, _b=0x76357b "",
_l=65509, _k=0x7fff06e92610, _o=0x7fff06e92600, _v=0x7fff06e925c0,
_n=2,
val2str=0x2b5aa3c2c270 <db_mysql_val2str>) at db/db_ut.c:284
284 ret = snprintf(_b + len, _l - len, "%.*s%s",
(gdb) bt
#0 0x00000000004d4578 in db_print_where (_c=0x808900, _b=0x76357b "",
_l=65509, _k=0x7fff06e92610, _o=0x7fff06e92600, _v=0x7fff06e925c0,
_n=2,
val2str=0x2b5aa3c2c270 <db_mysql_val2str>) at db/db_ut.c:284
#1 0x00000000004d2135 in db_do_delete (_h=0x808900, _k=0x7fff06e92610,
_o=0x7fff06e92600, _v=0x7fff06e925c0, _n=2,
val2str=0x2b5aa3c2c270 <db_mysql_val2str>,
submit_query=0x2b5aa3c24830 <db_mysql_submit_dummy_query>)
at db/db_query.c:213
#2 0x00002b5aa3c2a119 in db_mysql_delete (_h=0x808900, _k=0x7fff06e92610,
_o=0x7fff06e92600, _v=0x7fff06e925c0, _n=2) at dbase.c:883
#3 0x00002b5aa46c300f in db_timer_udomain (_d=0x2b5aa82a77d8) at
udomain.c:620
#4 0x00002b5aa46b9ac8 in synchronize_all_udomains () at dlist.c:557
#5 0x00002b5aa46c9f0e in timer (ticks=8423680, param=0x76357b) at
ul_mod.c:384
#6 0x0000000000486f13 in start_timer_processes () at timer.c:266
#7 0x000000000042987a in main (argc=8, argv=0x7fff06e92868) at main.c:819
The full code for the relevant line is:
284 ret = snprintf(_b + len, _l - len, "%.*s%s",
285 _k[i]->len, _k[i]->s, _o[i]);
And:
(gdb) print i
$3 = 0
(gdb) print _k[i]
$4 = (const db_key_t) 0x0
It's also worth noting that _o is really not initialized either, but since
it's not being dereferenced, it wouldn't be involved here.
Also for your viewing pleasure, here is the relevant stuff generated from
this in the syslog, starting from the point where I took my db server down.
My usrloc sync timer interval is 30 seconds...
Sep 17 15:13:18 sipprx-01 /usr/sbin/opensips[20149]:
INFO:db_mysql:db_mysql_do_p
repared_query: disconect event for 0x808a00
Sep 17 15:13:18 sipprx-01 /usr/sbin/opensips[20149]:
INFO:db_mysql:reset_all_sta
tements: reseting all statements on connection: (0x808900) 0x808a00
Sep 17 15:13:18 sipprx-01 /usr/sbin/opensips[20149]:
ERROR:db_mysql:db_mysql_con
nect: driver error(2003): Can't connect to MySQL server on '10.2.0.68'
(111)
Sep 17 15:13:18 sipprx-01 last message repeated 2 times
Sep 17 15:13:18 sipprx-01 /usr/sbin/opensips[20149]:
ERROR:usrloc:db_timer_udoma
in: failed to delete from table location
Sep 17 15:13:18 sipprx-01 /usr/sbin/opensips[20149]: ERROR:usrloc:timer:
synchro
nizing cache failed
Sep 17 15:13:48 sipprx-01 /usr/sbin/opensips[20149]:
INFO:db_mysql:re_init_state
ment: query is <delete from location where expires<? AND expires!=?>,
ptr=(nil
)
Sep 17 15:13:48 sipprx-01 /usr/sbin/opensips[20149]:
INFO:db_mysql:re_init_state
ment: disconect event for 0x808a00
Sep 17 15:13:48 sipprx-01 /usr/sbin/opensips[20149]:
ERROR:db_mysql:db_mysql_con
nect: driver error(2003): Can't connect to MySQL server on '10.2.0.68'
(111)
Sep 17 15:13:48 sipprx-01 last message repeated 2 times
Sep 17 15:13:48 sipprx-01 /usr/sbin/opensips[20149]:
ERROR:db_mysql:re_init_stat
ement: failed while mysql_stmt_prepare()
Sep 17 15:13:48 sipprx-01 /usr/sbin/opensips[20149]:
ERROR:db_mysql:db_mysql_do_
prepared_query: failed to re-init statement!
Sep 17 15:13:48 sipprx-01 /usr/sbin/opensips[20149]:
ERROR:usrloc:db_timer_udoma
in: failed to delete from table location
Sep 17 15:13:48 sipprx-01 /usr/sbin/opensips[20149]: ERROR:usrloc:timer:
synchro
nizing cache failed
Sep 17 15:14:18 sipprx-01 kernel: opensips[20149]: segfault at
0000000000000008
rip 00000000004d4578 rsp 00007fff06e924a0 error 4
Sep 17 15:14:18 sipprx-01 media-dispatcher[6993]:
[OpenSIPSControlProtocol,1194,
] Connection to OpenSIPS lost: Connection was closed cleanly.
Sep 17 15:14:18 sipprx-01 /usr/sbin/opensips[20123]:
INFO:core:handle_sigs: chil
d process 20149 exited by a signal 11
Sep 17 15:14:18 sipprx-01 /usr/sbin/opensips[20123]:
INFO:core:handle_sigs: core
was generated
Sep 17 15:14:18 sipprx-01 /usr/sbin/opensips[20123]:
INFO:core:handle_sigs: term
inating due to SIGCHLD
Thanks,
Phil
----------------------------------------------------------------------
Comment By: Phil D'Amore (ptdamore)
Date: 2009-09-17 22:01
Message:
Hi Bogdan...
I was getting this error in 1.5.3. Sorry, I should have been explicit
with this before, but I was in a bit too much of a hurry. 1.5.3 has
dbase.c from r5886 which seems to include all changes on the 1.5 branch to
date.
I see some of the changes you might have been referring to and I believe
this is a different problem. I see that the prep_stmt struct from db_mysql
does in fact cache the statement, which also puzzled me as I looked at this
problem. Now that I've thought about it, my fix is probably not optimal,
and db_mysql could be doing more to use that cached statement. The problem
I saw came up in the db_mysql_delete function, although all the db_mysql_OP
functions seem to be structured the same way.
Here's how I'm seeing it:
When the connection goes away, the re_init_statement function removes the
offending statement from the prep_stmt->stmts linked list. Since, at least
in the case of the usrloc timer task, there is only one statement, the
result is that prep_stmt->stmts is now NULL. As a result, the next call to
db_mysql_delete calls has_stmt_ctx, which now returns a false value. This
triggers a call to db_do_delete, which expects the keys to be populated. I
believe that since the keys are not populated (after all, as far as the
timer function is concerned my_ps is not NULL, so it didn't bother to
populate the keys), db_do_delete winds up exploding.
All of my test servers are now running my patch, but I'll back one out and
reproduce this and get a stack trace you can look at.
Thanks,
Phil
----------------------------------------------------------------------
Comment By: Bogdan-Andrei Iancu (bogdan_iancu)
Date: 2009-09-16 20:52
Message:
That is rather strange as the prepare statements (in mysql module) do catch
the original string query that created the statement and re-use it each
time the statement needs to be re-init. So, the keys are not actually
needed any more (after the original init).
Indeed, there was a problem , but not related to keys, but to the table -
actually this bug was fixed on SVN 1.5 branch last week, so if you update
it should work.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=1086410&aid=2860109&group_id=232389
More information about the Devel
mailing list