[OpenSIPS-Users] crash after failover

Bogdan-Andrei Iancu bogdan at opensips.org
Tue Apr 16 11:52:36 CEST 2013


Hi Chen-Che,

OpenSIPS is not crashing at all, but I see the dialog module is 
generating some queries with bogus values (for updating the dialog info) 
- I see the queries are setting state 4 which means the dialog just 
received the ACK for a 200 OK - so this is not related to BYE handling, 
but to ACK handling.

We will check the dlg_db_sync function in combination with postgres DB 
to see if we can replicate the issue.

Thanks and Regards,

Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
http://www.opensips-solutions.com


On 04/16/2013 10:26 AM, microx wrote:
> Hi all,
>
> In my environment, I use two active SIP proxy servers A and B that share one
> backend database for high performance and failover. When active proxy server
> A goes down, proxy server B will fetch the dialog information (originally
> processed by the proxy server A) from the database into its local memory
> when receiving a BYE. Specifically, proxy server B executes
> avp_db_query("update dialog set caller_sock='udp:127.0.0.1:5060',
> callee_sock='udp:127.0.0.1:5060' where caller_sock='udp:10.100.13.12:5060'")
> and then calls exec_msg("/usr/local/sbin/opensipsctl fifo dlg_db_sync").
> Note that udp:10.100.13.12:5060 is the processing socket of proxy server A.
>
> With the above approach, proxy server B can handle BYE requests whose
> associated dialogs are established by proxy server A in the beginning.
> However, when a number of BYE requests reach proxy server B simultaneously
> (60 BYES within one second), proxy server B may crash, resulting from an
> ineligible query "Query: update dialog set
> state=4,timeout=1366090793,caller_cseq=NULL,callee_cseq=NULL,caller_ping_cseq=0,callee_ping_cseq=0,vars=NULL,profiles=NULL,script_flags=0,flags=0
> where dlg_id=2972999775959". A part of the log is below.
>
> Although the query is not my used avp_db_query, I don't know whether the
> root cause is wrong setting of my config for failover. Any comment is
> greatly appreciated.
>
> Best regards,
> Chen-Che
>
> : DBG:db_postgres:db_postgres_submit_query: 0x7f0dec096010
> PQsendQuery(update dialog set
> state=4,timeout=1366102836,caller_cseq='0',callee_cseq='1',caller_ping_cseq=0,callee_ping_cseq=0,vars=NULL,profiles='O1S1R1#<null>|',script_flags=0,flags=144
> where dlg_id=2822250379813)
> : DBG:core:db_new_result: allocate 48 bytes for result set at 0x7f0dec0961a8
> : DBG:db_postgres:db_postgres_store_result: 0x7f0dec096010
> PQresultStatus(PGRES_COMMAND_OK) PQgetResult(0x10f0720)
> : DBG:db_postgres:free_query: PQclear(0x10f0720) result set
> : DBG:core:db_free_rows: freeing 0 rows
> : DBG:core:db_free_result: freeing result set at 0x7f0dec0961a8
> : DBG:dialog:dialog_update_db: updating existing dialog 0x7f0dc4271750
> : DBG:dialog:set_final_update_cols: DLG vals and profiles should not be
> saved[0:0]
> : DBG:db_postgres:db_postgres_submit_query: 0x7f0dec096010
> PQsendQuery(update dialog set
> state=4,timeout=1366090793,caller_cseq=NULL,callee_cseq=NULL,caller_ping_cseq=0,callee_ping_cseq=0,vars=NULL,profiles=NULL,script_flags=0,flags=0
> where dlg_id=2972999775959)
> : DBG:db_postgres:db_postgres_submit_query: 0x7f0dec096010 PQsendQuery
> failed: ERROR:  null value in column "caller_cseq" violates not-null
> constraint#012DETAIL:  Failing row contains (2972999775959,
> 20-31706 at 192.168.8.222, sip:100000039 at thcloud.com, 31706SIPpTag00Caller20,
> sip:100000040 at thcloud.com, 22109SIPpTag01Callee20, null, null, null, null,
> 0, 0, null, null, sip:100000039 at 192.168.8.222:6060,
> sip:10005 at 192.168.8.225:6060;transport=UDP, udp:127.0.0.1:5060,
> udp:127.0.0.1:5060, 4, 1366091036, 1366090793, null, null, 0, 0).#012 Query:
> update dialog set
> state=4,timeout=1366090793,caller_cseq=NULL,callee_cseq=NULL,caller_ping_cseq=0,callee_ping_cseq=0,vars=NULL,profiles=NULL,script_flags=0,flags=0
> where dlg_id=2972999775959
> : DBG:db_postgres:free_query: PQclear(0x10f1360) result set
> : ERROR:db_postgres:db_postgres_submit_query: 0x7f0dec096010 PQsendQuery
> Error: ERROR:  null value in column "caller_cseq" violates not-null
> constraint#012DETAIL:  Failing row contains (2972999775959,
> 20-31706 at 192.168.8.222, sip:100000039 at thcloud.com, 31706SIPpTag00Caller20,
> sip:100000040 at thcloud.com, 22109SIPpTag01Callee20, null, null, null, null,
> 0, 0, null, null, sip:100000039 at 192.168.8.222:6060,
> sip:10005 at 192.168.8.225:6060;transport=UDP, udp:127.0.0.1:5060,
> udp:127.0.0.1:5060, 4, 1366091036, 1366090793, null, null, 0, 0).#012 Query:
> update dialog set
> state=4,timeout=1366090793,caller_cseq=NULL,callee_cseq=NULL,caller_ping_cseq=0,callee_ping_cseq=0,vars=NULL,profiles=NULL,script_flags=0,flags=0
> where dlg_id=2972999775959
> : ERROR:core:db_do_update: error while submitting query
> : DBG:core:db_new_result: allocate 48 bytes for result set at 0x7f0dec0961a8
> : DBG:db_postgres:db_postgres_store_result: 0x7f0dec096010
> PQresultStatus(PGRES_FATAL_ERROR) PQgetResult((nil))
> : ERROR:db_postgres:db_postgres_store_result: 0x7f0dec096010 - invalid
> query, execution aborted
> : ERROR:db_postgres:db_postgres_store_result: 0x7f0dec096010:
> PGRES_FATAL_ERROR
> : ERROR:db_postgres:db_postgres_store_result: 0x7f0dec096010:
> : DBG:core:db_free_rows: freeing 0 rows
> : DBG:core:db_free_result: freeing result set at 0x7f0dec0961a8
> : WARNING:db_postgres:db_postgres_update: unexpected result returned
> : ERROR:dialog:dialog_update_db: could not update database info
> : DBG:core:pool_remove: connection still kept in the pool
> Apr 16 13:44:46  /usr/sbin/opensips[18220]: last message repeated 2 times
> : DBG:db_postgres:db_postgres_submit_query: 0x7f0dec095bc0
> PQsendQuery(delete from location where expires<'2013-04-16 00:44:47' AND
> expires!='1969-12-31 18:00:00')
> : DBG:core:db_new_result: allocate 48 bytes for result set at 0x7f0dec0961a8
> : DBG:db_postgres:db_postgres_store_result: 0x7f0dec095bc0
> PQresultStatus(PGRES_COMMAND_OK) PQgetResult(0x10f0720)
> : DBG:db_postgres:free_query: PQclear(0x10f0720) result set
> : DBG:core:db_free_rows: freeing 0 rows
> : DBG:core:db_free_result: freeing result set at 0x7f0dec0961a8
> : DBG:core:pool_remove: removing connection from the pool
> : DBG:db_postgres:db_postgres_free_connection: PQfinish(0x10e7d50)
> : DBG:db_postgres:db_postgres_free_connection: pkg_free(0x7f0dec095cc0)
> : DBG:tm:tm_shutdown: tm_shutdown : start
> : DBG:tm:unlink_timer_lists: emptying DELETE list
> : DBG:tm:tm_shutdown: emptying hash table
> : DBG:tm:run_trans_callbacks: trans=0x7f0dc42f86f8, callback type 4096, id 1
> entered
> : DBG:tm:run_trans_callbacks: trans=0x7f0dc42fcfb0, callback type 4096, id 1
> entered
> : DBG:tm:run_trans_callbacks: trans=0x7f0dc42fe9f8, callback type 4096, id 1
> entered
> : DBG:tm:run_trans_callbacks: trans=0x7f0dc42fa140, callback type 4096, id 1
> entered
> : DBG:tm:run_trans_callbacks: trans=0x7f0dc4272290, callback type 4096, id 1
> entered
> : DBG:tm:tm_shutdown: releasing timers
> : DBG:tm:tm_shutdown: removing semaphores
> : DBG:tm:tm_shutdown: destroying callback lists
> : DBG:tm:tm_shutdown: tm_shutdown : done
> Apr 16 13:45:46 qri-cdrc-application-node1 /usr/sbin/opensips[18220]:
> CRITICAL:core:sig_alarm_abort: BUG - shutdown timeout triggered, dying...
>
>
>
> --
> View this message in context: http://opensips-open-sip-server.1449251.n2.nabble.com/crash-after-failover-tp7585845.html
> Sent from the OpenSIPS - Users mailing list archive at Nabble.com.
>
> _______________________________________________
> Users mailing list
> Users at lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>



More information about the Users mailing list