[OpenSIPS-Devel] [ opensips-Bugs-2940556 ] Timer bug in TM handler
SourceForge.net
noreply at sourceforge.net
Tue Jun 28 09:53:59 CEST 2011
Bugs item #2940556, was opened at 2010-01-27 00:34
Message generated for change (Settings changed) made by bogdan_iancu
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=1086410&aid=2940556&group_id=232389
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: modules
Group: 1.5.x
>Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: Andrew Pogrebennyk (apogrebennyk)
Assigned to: Bogdan-Andrei Iancu (bogdan_iancu)
Summary: Timer bug in TM handler
Initial Comment:
I have got stuck trying to implement a sort of serial forking (forwarding) scenario with OpenSIPS. The thing is that when fr_inv_timer hits, OpenSIPS
(prematurely) sends INVITE on the next branch and only after that CANCELs the previous one. (And if the gateway receives different branch on transaction to which no final reply has been sent yet - it can merge the requests.) It can be clearly seen in the attached trace that contains messages from the OpenSIPS to the b2bua on the same machine port 5061. I've found this behavior to be consistent in the versions 1.3.2 - 1.5.3, but the logs that follow are from OpenSER 1.3.2. Similar bug has been reported for Kamailio: http://lists.kamailio.org/pipermail/devel/2009-May/018982.html
Another thing I've found is that OpenSIPS resets the fr_timer in retransmission_handler() if no provisional response to INVITE has been received. So if the fr_timer hits OpenSIPS silently fails over to the next forwarding destination without sending CANCEL to this branch (again 1.5.3 is affected). Here we see that it forwards the INVITE and sets FR_TIMER as per script:
Jan 27 05:04:14 sip2 /usr/local/sbin/openser[48744]: Request leaving server, D-URI='<null>' - M=INVITE RURI=sip:40084956288253 at 195.128.80.211:5061 F=sip:000100 at sip2.d-tel.ru T=sip:000101 at sip2.d-tel.ru IP=77.122.227.73 ID=YmM1NGE4NWQ1ZWZhZjVjNDBkNzljODZjYTk5MDVlN2Q.
[...skipped...]
Jan 27 05:04:14 sip2 /usr/local/sbin/openser[48744]: DBG:tm:_set_fr_retr: FR_TIMER = 10
Jan 27 05:04:14 sip2 /usr/local/sbin/openser[48744]: DBG:tm:set_timer: relative timeout is 10
Jan 27 05:04:14 sip2 /usr/local/sbin/openser[48744]: DBG:tm:set_timer: calling insert_timer_unsafe: timeout=10, final value=46
...
Here it hits the retransmission_handler() - and this "relative timeout
is 1000000" messages appears:
Jan 27 05:04:14 sip2 /usr/local/sbin/openser[48744]: DBG:tm:retransmission_handler: retransmission_handler : request resending (t=0x28701ad0, INVITE si ... )
Jan 27 05:04:14 sip2 /usr/local/sbin/openser[48744]: DBG:tm:set_timer: relative timeout is 1000000
Jan 27 05:04:14 sip2 /usr/local/sbin/openser[48744]: DBG:tm:set_timer: calling insert_timer_unsafe: timeout=1000000, final value=37500000
Jan 27 05:04:14 sip2 /usr/local/sbin/openser[48744]: DBG:tm:insert_timer_unsafe: [5]: 0x28701d30 (37500000)
Jan 27 05:04:14 sip2 /usr/local/sbin/openser[48744]: DBG:tm:retransmission_handler: retransmission_handler : done
I think this timeout somehow interferes with fr_timer because when fr_timer hits in 20 seconds, OpenSIPS _silently_ fails over to the next
forwarding destination without sending CANCEL to _this_ branch. This occurs only when no provisional response has been got so I tend to think
this is somehow related to the timeout thing.
I have added extra debug print to insert_timer_unsafe() like:
LM_DBG("calling insert_timer_unsafe: timeout=%lld, final value=%lld\n",timeout,timeout + get_ticks());
so you could see the sequence of timer updates:
Jan 27 05:04:15 sip2 /usr/local/sbin/openser[48744]: DBG:tm:set_timer: calling insert_timer_unsafe: timeout=2000000, final value=39500000
Jan 27 05:04:17 sip2 /usr/local/sbin/openser[48744]: DBG:tm:set_timer: calling insert_timer_unsafe: timeout=4000000, final value=43500000
Jan 27 05:04:21 sip2 /usr/local/sbin/openser[48744]: DBG:tm:set_timer: calling insert_timer_unsafe: timeout=4000000, final value=47500000
Then OpenSIPS fails over to the next forwarding destination on locally generated 408 - but note there is no real "sending cancel..." message
from cancel_branch() between these messages:
Jan 27 05:04:24 sip2 /usr/local/sbin/openser[48744]: DBG:tm:insert_timer_unsafe: [0]: 0x28701e60 (56)
Jan 27 05:04:24 sip2 /usr/local/sbin/openser[48744]: DBG:tm:relay_reply: branch=1, save=1, relay=-1
Jan 27 05:04:24 sip2 /usr/local/sbin/openser[48744]: DBG:tm:final_response_handler: done
Please check.
----------------------------------------------------------------------
>Comment By: Bogdan-Andrei Iancu (bogdan_iancu)
Date: 2011-06-28 10:53
Message:
Fix uploaded on trunk (future 1.7).
----------------------------------------------------------------------
Comment By: Andrew Pogrebennyk (apogrebennyk)
Date: 2010-08-04 17:13
Message:
Bogdan,
Not yet, I was put back with this project. I'll try to recreate the setup
somewhere over the weekend or early next week.
Regards,
Andrew
----------------------------------------------------------------------
Comment By: Bogdan-Andrei Iancu (bogdan_iancu)
Date: 2010-08-03 02:00
Message:
Andrew,
have you tested the patch ? any feedback on it?
Regards,
Bogdan
----------------------------------------------------------------------
Comment By: Bogdan-Andrei Iancu (bogdan_iancu)
Date: 2010-02-17 17:10
Message:
Hi Andrew,
Please test the attached patch - it is a highly experimental fix - I just
did only some short tests with it, so it may have side effects or it may
crash opensips .
Again, please test it carefully and let me know if you find any issues
with it.
Regards,
Bogdan
----------------------------------------------------------------------
Comment By: Andrew Pogrebennyk (apogrebennyk)
Date: 2010-02-09 19:30
Message:
Bogdan,
Thank you. Sure I'm willing to test you the first issue.
Regarding the second issue - thanks for clarifying that for me.
Andrew
----------------------------------------------------------------------
Comment By: Bogdan-Andrei Iancu (bogdan_iancu)
Date: 2010-02-09 19:20
Message:
Hi Andrew,
Indeed, the CANCEL is fired after the new branches were sent out - I can
work out a patch to change the order - it is a bit delicate (as everything
related to TM, but I think is doable) - I hope you are willing to test it
for me ;)
Regarding the second issue, that is exactly what RFC says - if you get no
reply at all from a destination, do not attempt to send a CANCEL there.
Regards,
Bogdan
----------------------------------------------------------------------
Comment By: Andrew Pogrebennyk (apogrebennyk)
Date: 2010-01-27 00:39
Message:
Things to note in the ngrep trace: INVITE for branch 1 comes at
05:04:14.173313 - before the CANCEL for branch 0 which comes at
05:04:14.173572.
Then there's INVITE for branch 2 at 05:04:24.274084, but no CANCEL for the
previous branch (fr_timer in action - my b2bua kept retransmitting 183
Session Progress with branch 0, so there was no response for branch 1 at
all).
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=1086410&aid=2940556&group_id=232389
More information about the Devel
mailing list