[OpenSIPS-Devel] [ opensips-Bugs-2593088 ] Crash with db_postgres and presence

SourceForge.net noreply at sourceforge.net
Thu Feb 12 19:00:38 CET 2009


Bugs item #2593088, was opened at 2009-02-12 13:48
Message generated for change (Comment added) made by nobody
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=1086410&aid=2593088&group_id=232389

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Crash with db_postgres and presence

Initial Comment:
(submitted by vasil.kolev at attractel.com)

The problem is as follows - a query is executed, but for some reason the results from the previous one are read back from the DB driver, which in turn makes opensips to crash as the number of columns is lower and it tries to dereference stuff that doesn't exist.

Appended are the log and the backtrace:


Feb 12 13:42:59 f-ast2 opensips[29504]: DBG:db_postgres:db_postgres_submit_query: 0x8197240 PQsendQuery(select status,reason from watchers where presentity_uri='sip:bbb%40ccc.com at 10.2.30.5' AND watcher_username='aaa%40ccc.com' AND watcher_domain='10.2.30.5' AND event='presence') 
Feb 12 13:42:59 f-ast2 opensips[29504]: DBG:core:db_new_result: allocate 28 bytes for result set at 0x81a0948 
Feb 12 13:42:59 f-ast2 opensips[29504]: DBG:db_postgres:db_postgres_store_result: 0x8197240 PQresultStatus(PGRES_TUPLES_OK) PQgetResult(0x94c0300) 
Feb 12 13:42:59 f-ast2 opensips[29504]: DBG:db_postgres:db_postgres_get_columns: 1 columns returned from the query 
Feb 12 13:42:59 f-ast2 opensips[29504]: DBG:core:db_allocate_columns: allocate 16 bytes for result columns at 0x819ecd0 
Feb 12 13:42:59 f-ast2 opensips[29504]: DBG:db_postgres:db_postgres_get_columns: RES_NAMES(0x819ecd8)[0]=[username] 
Feb 12 13:42:59 f-ast2 opensips[29504]: DBG:db_postgres:db_postgres_get_columns: use DB_STRING result type 
Feb 12 13:42:59 f-ast2 opensips[29504]: DBG:db_postgres:db_postgres_convert_rows: allocate for 1 columns 4 bytes in row buffer at 0x819d5b8 
Feb 12 13:42:59 f-ast2 opensips[29504]: DBG:core:db_allocate_rows: allocate 28 bytes for result rows and values at 0x819f870 
Feb 12 13:42:59 f-ast2 opensips[29504]: DBG:db_postgres:db_postgres_convert_rows: PQgetvalue(0x8197240,0,0)=[aaa at ccc.com] 
Feb 12 13:42:59 f-ast2 opensips[29504]: DBG:db_postgres:db_postgres_convert_rows: allocated 11 bytes for row_buf[0] at 0x819d5c8 
Feb 12 13:42:59 f-ast2 opensips[29504]: DBG:db_postgres:db_postgres_convert_rows: [0][0] Column[username]=[aaa at ccc.com] 
Feb 12 13:42:59 f-ast2 opensips[29504]: DBG:db_postgres:db_postgres_str2val: converting STRING [aaa at ccc.com] 
Feb 12 13:42:59 f-ast2 opensips[29504]: DBG:db_postgres:db_postgres_convert_rows: freeing row buffer at 0x819d5b8 
Feb 12 13:42:59 f-ast2 opensips[29504]: DBG:db_postgres:free_query: PQclear(0x94c0300) result set 
Feb 12 13:42:59 f-ast2 opensips[29507]: DBG:db_postgres:db_postgres_store_result: 0x8197100 PQresultStatus(PGRES_TUPLES_OK) PQgetResult(0x94bb108) 
Feb 12 13:42:59 f-ast2 kernel: [842163.893756] opensips[29504]: segfault at 1 ip b77b98c6 sp bff1ab30 error 4 in presence.so[b7790000+31000]



Core was generated by `opensips'.
Program terminated with signal 11, Segmentation fault.
[New process 29504]
#0  0xb77b98c6 in get_db_subs_auth (subs=0xbff1acf4, found=0xbff1ad94) at subscribe.c:1980
1980			subs->reason.len= strlen(row_vals[1].val.string_val);
(gdb) bt
#0  0xb77b98c6 in get_db_subs_auth (subs=0xbff1acf4, found=0xbff1ad94) at subscribe.c:1980
#1  0xb77ba5cc in handle_subscribe (msg=0x819acb8, str1=0x0, str2=0x0) at subscribe.c:630
#2  0x08055591 in do_action (a=0x8195940, msg=0x819acb8) at action.c:961
#3  0x080541e2 in run_action_list (a=0x8195940, msg=0x819acb8) at action.c:139
#4  0x08056ecd in do_action (a=0x8195fe0, msg=0x819acb8) at action.c:705
#5  0x080541e2 in run_action_list (a=0x8195fe0, msg=0x819acb8) at action.c:139
#6  0x08057529 in do_action (a=0x8196048, msg=0x819acb8) at action.c:711
#7  0x080541e2 in run_action_list (a=0x81953d0, msg=0x819acb8) at action.c:139
#8  0x08056914 in do_action (a=0x8192808, msg=0x819acb8) at action.c:119
#9  0x080541e2 in run_action_list (a=0x8192808, msg=0x819acb8) at action.c:139
#10 0x08056ecd in do_action (a=0x8192870, msg=0x819acb8) at action.c:705
#11 0x080541e2 in run_action_list (a=0x818d6b8, msg=0x819acb8) at action.c:139
#12 0x08057ee1 in run_top_route (a=0x818d6b8, msg=0x819acb8) at action.c:119
#13 0x0809228c in receive_msg (
    buf=0x8164100 "SUBSCRIBE sip:bbb%40ccc.com at 10.2.30.5;transport=UDP SIP/2.0\r\nVia: SIP/2.0/UDP 10.2.2.179:5060;branch=z9hG4bK-d8754z-1c3f735ee1e02d36-1---d8754z-\r\nMax-Forwards: 69\r\nContact: <sip:aaa%40ccc.com at 10.2.2.1"..., len=793, rcv_info=0xbff1bad4) at receive.c:165
#14 0x080cdf4b in udp_rcv_loop () at udp_server.c:449
#15 0x0806c3af in main (argc=1, argv=0xbff1bc64) at main.c:778



----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2009-02-12 18:00

Message:

The issue is as follows:


When you do a query and after that you get context switched to another
thread that does a query, it will execute its own query and see the result
of the previous one. Simple scheme:

Thread A		Thread B
do_query(A)
			do_query(B)
			fetch_result()
...

The result is that thread B crashes, as the results returned are from the
wrong query. I've seen crashes in pretty much every module that uses the DB
connection.

My initial solution was to just add a lock in db_con_t structure and track
it, but that didn't solve the problem, which in turn led me to find that
new connections are initialized in db/db.c in db_do_init(), and there every
new connection is checked if it exists in the pool. So if you use a few
modules with the same database (as I do in my config, I have the same
database/user/pass for uri_db, usrloc, auth_db, presence, presence_xml and
xcap_client), only one entry will be created in the db_pool stuff. This in
turn leads to having multiple db_con_t structures for one DB connection,
which made the lock pretty much useless.

As a hack, I just made the db_do_init() not to check in the pool if a
connection exists, but to just open a new one. This will have a pretty
nasty effect on the number of open database connections (my setup opened
about 70), but at least it shouldn't crash this way. My ugly patch is
available at http://tiki.securax.net/mx/db_lock.diff .

A good solution is to have internal locking per DB connection which in
turn can be accessed with a member of db_func_t from the relevant
functions. I don't seem to have a better idea right now...

-- 
Regards,
Vasil Kolev
Attractel NV
dCAP #1324, LPIC2

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2009-02-12 15:44

Message:
I have been looking into the problem for some time and it seems like
there's a race condition in the postgresql driver somewhere. I have seen
how a query A is made, then query B hits, and then the part that sent B
sees the results of A and crashes opensips.

I'll try to find the place where the lock should happen, but I'll need to
get a bit more familiar with the types of locking primitives used.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=1086410&aid=2593088&group_id=232389



More information about the Devel mailing list