<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<title></title>
</head>
<body text="#000000" bgcolor="#ffffff">
<font size="-1"><font face="Courier New, Courier, monospace">Hi
again Bogdan, <br>
<br>
I'm sorry its takes such a long time to reply, considering the
lightning-<br>
fast supportservice you are providing for all of us on this list
:-)<br>
<br>
Anyway, of course your suggestion helped, so I know have
serialforking <br>
working!<br>
<br>
A few notes though. It seems like I need the
serialize_branches() to return<br>
a useful returncode as well. Otherwise my script cannot
differentiate between<br>
when serialforking really is being done, or when normal proxy or
parallell fork<br>
is in progress. (In which case I wanted normal timer C)<br>
Returning 1 in the end of serialize.c, instead of 0 which is
returned when nothing<br>
is performed by the call to serialize_branches() took care of
that.<br>
This cause action.c do LOG_ERR though, so I changed that to only
log error if <br>
the return from serialize_branches was < 0.<br>
<br>
When you have time, I am very interessted in your views on my
other issues.<br>
<br>
Regards<br>
Taisto Qvist<br>
<br>
</font></font><br>
Bogdan-Andrei Iancu skrev 2010-10-13 23:16:
<blockquote cite="mid:4CB621B0.3080109@voice-system.ro" type="cite">Hi
Taisto,
<br>
<br>
Your problem is not timer related or how serial forking is done in
opensips (I will comment on these in a later reply).
<br>
<br>
Right now, the quick answer to fix your problem: failure route
must be re-armed after each branch -> this is why your failure
route does not catches the end of the second branch. Adding a
t_on_failure("1") before t_relay() in failure route will fix your
problem.
<br>
<br>
Regards,
<br>
Bogdan
<br>
<br>
<br>
<br>
Taisto Qvist wrote:
<br>
<blockquote type="cite">Hi Bogdan,
<br>
<br>
I've now been trying with some tests, and I cant really get it
to work,
<br>
since the transactionlayer on the server transaction returns a
408
<br>
back to the UAC before serial forking has ended.
<br>
This seems a little bit related to what I once commented on a
long time
<br>
ago regarding handling of timer C and the fact that the timer c
seems to
<br>
be quite "tied" to Timer B
<br>
<br>
When the fr_timer pops, (causing the CANCEL to be sent so that
we can move
<br>
on to the next serial-fork-target), the tm-layer seems store
this timer-pop
<br>
as a 408 response
<br>
<br>
20:41:44 osips[4686]: DBG:tm:utimer_routine: timer
routine:4,tl=0xb5b6770c next=(nil), timeout=649300000
<br>
20:41:55 osips[4686]: DBG:tm:timer_routine: timer
routine:1,tl=0xb5b67728 next=(nil), timeout=660
<br>
20:41:55 osips[4686]: DBG:tm:final_response_handler: stop retr.
and send CANCEL (0xb5b675c0)
<br>
20:41:55 osips[4686]: DBG:tm:t_should_relay_response:
T_code=180, new_code=408
<br>
20:41:55 osips[4686]: DBG:tm:t_pick_branch: picked branch 0,
code 408 (prio=800)
<br>
<br>
As the capture and log I've attached indicates, I am not able to
perform a three
<br>
step serial fork. I have three Uas:es registered with 1.0, 0.9,
and 0.8 in q-values.
<br>
<br>
First timer pop causes a CANCEL, and a new INVITE towards UAS
with q=0.9, but when
<br>
it pops the second time, TM still cancels the second target, but
instead of continuing
<br>
with the third, it sends a 408 towards the UAC.
<br>
<br>
It might be something with my script-handling in the
failure_route, so here it is:
<br>
<br>
failure_route[1]
<br>
{
<br>
if ( t_was_cancelled() )
<br>
{
<br>
log(2, "transaction was cancelled by UAC\n");
<br>
}
<br>
xlog("(lab1) - In FailureRoute:
branches=$(branch(uri)[*])\n");
<br>
if ( isflagset(1) )
<br>
{
<br>
log(2,"(lab1) - 3++ Received, attempting serial
fork!\n");
<br>
next_branches();
<br>
switch ( $retcode )
<br>
{
<br>
case 1:
<br>
log(2,"(lab1) - More branches left, rollOver
timer set.");
<br>
$avp(s:timerC) = 12;
<br>
setflag(1); # Do I need this? Should I use
branchflags instead?
<br>
break;
<br>
case 2:
<br>
log(2,"(lab1) - Last branch, timerC set to 60
sec");
<br>
$avp(s:timerC) = 60;
<br>
break;
<br>
<br>
default:
<br>
log(2,"(lab1) - No more serial fork
targets.");
<br>
exit;
<br>
}
<br>
if ( !t_relay() )
<br>
{
<br>
log(2,"(lab1) - Error during relay for serial
fork!\n");
<br>
}
<br>
}
<br>
else
<br>
{
<br>
log(2,"(lab1) - 3++ result. Serialforking not
available.\n");
<br>
}
<br>
<br>
}
<br>
<br>
When I say that it seems related to another issue I commented on
a long time
<br>
ago, I am referring to the general handling of Timer C, which
doesn't seem to
<br>
be a separate timer, but is reusing the timerB.
<br>
<br>
When the timer pops after the normal 180 seconds, the TM layer
will *instantly*
<br>
generate a 408 response on the server txn, while at the same
time generating
<br>
the CANCEL attempting to terminate the client txn.
<br>
To me, this is wrong, but maybe I am suppose to handle this in
the failure_route?
<br>
<br>
What I would expect is that the CANCEL will cause a 487 response
from the UAS,
<br>
and this will be the final response sent to the UAC.
<br>
Also by behaving this way, we may cause a protocol violation
even though the risk
<br>
is small.
<br>
<br>
Once timer C pops we send the CANCEL hoping that it will cause a
487. BUT, it is
<br>
quite possible that before the cancel is received by the UAS, it
sends a 200 to
<br>
the INVITE! Even IF the CANCEL receives a 2xx response, we may
still get a 2xx
<br>
response to the INVITE.
<br>
But with the current behavior of opensips, this would cause
opensips to proxy
<br>
TWO final responses on the server txn, once being the initial
408 sent by the
<br>
txn on timer C timeout, and then the real/actual 2xx sent by the
uas.
<br>
<br>
I've also seen a similar problem with 6xx responses received on
a branch during
<br>
forking.
<br>
Opensips forwards the 6xx *before* the remaining client txns has
completed, and
<br>
there is no guarantee that these client txns will all terminate
with 487 even
<br>
if opensips tries to CANCEL all of them asap.
<br>
They may still return 2xx to the invite, which would cause a
forwarding of both
<br>
a 6xx and a 2xx on the server txn. This scenario is even
mentioned in rfc3261.
<br>
<br>
So all these three problems have in common that the server txn
seems to be
<br>
terminating a bit early, before the client side has fully
completed, but as
<br>
I said, it might at least partially be something I should handle
in my
<br>
failure_routes...?
<br>
<br>
Thanks for all your help.
<br>
Regards
<br>
Taisto Qvist
<br>
<br>
<br>
Bogdan-Andrei Iancu skrev 2010-10-06 17:04:
<br>
<blockquote type="cite">Hi Taisto,
<br>
<br>
could you test the rev 7248 on trunk for solution 2) ? if ok,
I will backport to 1.6
<br>
<br>
Regards,
<br>
Bogdan
<br>
</blockquote>
</blockquote>
<br>
<br>
</blockquote>
</body>
</html>