[OpenSIPS-Users] Serialforking failure, with lcr:parse_phostport: too many colons in udp:: 0

Wed Oct 13 23:16:32 CEST 2010

Hi Taisto,

Your problem is not timer related or how serial forking is done in 
opensips (I will comment on these in a later reply).

Right now, the quick answer to fix your problem: failure route must be 
re-armed after each branch -> this is why your failure route does not 
catches the end of the second branch. Adding a t_on_failure("1") before 
t_relay() in failure route will fix your problem.

Regards,
Bogdan

Taisto Qvist wrote:
> Hi Bogdan,
>
> I've now been trying with some tests, and I cant really get it to work,
> since the transactionlayer on the server transaction returns a 408
> back to the UAC before serial forking has ended.
> This seems a little bit related to what I once commented on a long time
> ago regarding handling of timer C and the fact that the timer c seems to
> be quite "tied" to Timer B
>
> When the fr_timer pops, (causing the CANCEL to be sent so that we can move
> on to the next serial-fork-target), the tm-layer seems store this 
> timer-pop
> as a 408 response
>
> 20:41:44 osips[4686]: DBG:tm:utimer_routine: timer 
> routine:4,tl=0xb5b6770c next=(nil), timeout=649300000
> 20:41:55 osips[4686]: DBG:tm:timer_routine: timer 
> routine:1,tl=0xb5b67728 next=(nil), timeout=660
> 20:41:55 osips[4686]: DBG:tm:final_response_handler: stop retr. and 
> send CANCEL (0xb5b675c0)
> 20:41:55 osips[4686]: DBG:tm:t_should_relay_response: T_code=180, 
> new_code=408
> 20:41:55 osips[4686]: DBG:tm:t_pick_branch: picked branch 0, code 408 
> (prio=800)
>
> As the capture and log I've attached indicates, I am not able to 
> perform a three
> step serial fork. I have three Uas:es registered with 1.0, 0.9, and 
> 0.8 in q-values.
>
> First timer pop causes a CANCEL, and a new INVITE towards UAS with 
> q=0.9, but when
> it pops the second time, TM still cancels the second target, but 
> instead of continuing
> with the third, it sends a 408 towards the UAC.
>
> It might be something with my script-handling in the failure_route, so 
> here it is:
>
> failure_route[1]
> {
>     if ( t_was_cancelled() )
>     {
>         log(2, "transaction was cancelled by UAC\n");
>     }
>     xlog("(lab1) - In FailureRoute: branches=$(branch(uri)[*])\n");
>     if ( isflagset(1) )
>     {
>         log(2,"(lab1) - 3++ Received, attempting serial fork!\n");
>         next_branches();
>         switch ( $retcode )
>         {
>             case 1:
>                    log(2,"(lab1) - More branches left, rollOver timer 
> set.");
>                    $avp(s:timerC) = 12;
>                    setflag(1);  # Do I need this? Should I use 
> branchflags instead?
>             break;
>             case 2:
>                    log(2,"(lab1) - Last branch, timerC set to 60 sec");
>                    $avp(s:timerC) = 60;
>             break;
>
>             default:
>                    log(2,"(lab1) - No more serial fork targets.");
>                    exit;
>         }
>         if ( !t_relay() )
>         {
>             log(2,"(lab1) - Error during relay for serial fork!\n");
>         }
>     }
>     else
>     {
>         log(2,"(lab1) - 3++ result. Serialforking not available.\n");
>     }
>
> }
>
> When I say that it seems related to another issue I commented on a 
> long time
> ago, I am referring to the general handling of Timer C, which doesn't 
> seem to
> be a separate timer, but is reusing the timerB.
>
> When the timer pops after the normal 180 seconds, the TM layer will 
> *instantly*
> generate a 408 response on the server txn, while at the same time 
> generating
> the CANCEL attempting to terminate the client txn.
> To me, this is wrong, but maybe I am suppose to handle this in the 
> failure_route?
>
> What I would expect is that the CANCEL will cause a 487 response from 
> the UAS,
> and this will be the final response sent to the UAC.
> Also by behaving this way, we may cause a protocol violation even 
> though the risk
> is small.
>
> Once timer C pops we send the CANCEL hoping that it will cause a 487. 
> BUT, it is
> quite possible that before the cancel is received by the UAS, it sends 
> a 200 to
> the INVITE! Even IF the CANCEL receives a 2xx response, we may still 
> get a 2xx
> response to the INVITE.
> But with the current behavior of opensips, this would cause opensips 
> to proxy
> TWO final responses on the server txn, once being the initial 408 sent 
> by the
> txn on timer C timeout, and then the real/actual 2xx sent by the uas.
>
> I've also seen a similar problem with 6xx responses received on a 
> branch during
> forking.
> Opensips forwards the 6xx *before* the remaining client txns has 
> completed, and
> there is no guarantee that these client txns will all terminate with 
> 487 even
> if opensips tries to CANCEL all of them asap.
> They may still return 2xx to the invite, which would cause a 
> forwarding of both
> a 6xx and a 2xx on the server txn. This scenario is even mentioned in 
> rfc3261.
>
> So all these three problems have in common that the server txn seems 
> to be
> terminating a bit early, before the client side has fully completed, 
> but as
> I said, it might at least partially be something I should handle in my
> failure_routes...?
>
> Thanks for all your help.
> Regards
> Taisto Qvist
>
>
> Bogdan-Andrei Iancu skrev 2010-10-06 17:04:
>> Hi Taisto,
>>
>> could you test the rev 7248 on trunk for solution 2) ? if ok, I will 
>> backport to 1.6
>>
>> Regards,
>> Bogdan

-- 
Bogdan-Andrei Iancu
OpenSIPS Bootcamp
15 - 19 November 2010, Edison, New Jersey, USA
www.voice-system.ro