<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

    <title></title>

  </head>

  <body text="#000000" bgcolor="#ffffff">

    <font size="-1"><font face="Courier New, Courier, monospace">Hi

        again Bogdan, <br>

        <br>

        I'm sorry its takes such a long time to reply, considering the

        lightning-<br>

        fast supportservice you are providing for all of us on this list

        :-)<br>

        <br>

        Anyway, of course your suggestion helped, so I know have

        serialforking <br>

        working!<br>

        <br>

        A few notes though. It seems like I need the

        serialize_branches() to return<br>

        a useful returncode as well. Otherwise my script cannot

        differentiate between<br>

        when serialforking really is being done, or when normal proxy or

        parallell fork<br>

        is in progress. (In which case I wanted normal timer C)<br>

        Returning 1 in the end of serialize.c, instead of 0 which is

        returned when nothing<br>

        is performed by the call to serialize_branches() took care of

        that.<br>

        This cause action.c do LOG_ERR though, so I changed that to only

        log error if <br>

        the return from serialize_branches was &lt; 0.<br>

        <br>

        When you have time, I am very interessted in your views on my

        other issues.<br>

        <br>

        Regards<br>

        Taisto Qvist<br>

        <br>

      </font></font><br>

    Bogdan-Andrei Iancu skrev 2010-10-13 23:16:

    <blockquote cite="mid:4CB621B0.3080109@voice-system.ro" type="cite">Hi

      Taisto,

      <br>

      <br>

      Your problem is not timer related or how serial forking is done in

      opensips (I will comment on these in a later reply).

      <br>

      <br>

      Right now, the quick answer to fix your problem: failure route

      must be re-armed after each branch -&gt; this is why your failure

      route does not catches the end of the second branch. Adding a

      t_on_failure("1") before t_relay() in failure route will fix your

      problem.

      <br>

      <br>

      Regards,

      <br>

      Bogdan

      <br>

      <br>

      <br>

      <br>

      Taisto Qvist wrote:

      <br>

      <blockquote type="cite">Hi Bogdan,

        <br>

        <br>

        I've now been trying with some tests, and I cant really get it

        to work,

        <br>

        since the transactionlayer on the server transaction returns a

        408

        <br>

        back to the UAC before serial forking has ended.

        <br>

        This seems a little bit related to what I once commented on a

        long time

        <br>

        ago regarding handling of timer C and the fact that the timer c

        seems to

        <br>

        be quite "tied" to Timer B

        <br>

        <br>

        When the fr_timer pops, (causing the CANCEL to be sent so that

        we can move

        <br>

        on to the next serial-fork-target), the tm-layer seems store

        this timer-pop

        <br>

        as a 408 response

        <br>

        <br>

        20:41:44 osips[4686]: DBG:tm:utimer_routine: timer

        routine:4,tl=0xb5b6770c next=(nil), timeout=649300000

        <br>

        20:41:55 osips[4686]: DBG:tm:timer_routine: timer

        routine:1,tl=0xb5b67728 next=(nil), timeout=660

        <br>

        20:41:55 osips[4686]: DBG:tm:final_response_handler: stop retr.

        and send CANCEL (0xb5b675c0)

        <br>

        20:41:55 osips[4686]: DBG:tm:t_should_relay_response:

        T_code=180, new_code=408

        <br>

        20:41:55 osips[4686]: DBG:tm:t_pick_branch: picked branch 0,

        code 408 (prio=800)

        <br>

        <br>

        As the capture and log I've attached indicates, I am not able to

        perform a three

        <br>

        step serial fork. I have three Uas:es registered with 1.0, 0.9,

        and 0.8 in q-values.

        <br>

        <br>

        First timer pop causes a CANCEL, and a new INVITE towards UAS

        with q=0.9, but when

        <br>

        it pops the second time, TM still cancels the second target, but

        instead of continuing

        <br>

        with the third, it sends a 408 towards the UAC.

        <br>

        <br>

        It might be something with my script-handling in the

        failure_route, so here it is:

        <br>

        <br>

        failure_route[1]

        <br>

        {

        <br>

        &nbsp;&nbsp;&nbsp; if ( t_was_cancelled() )

        <br>

        &nbsp;&nbsp;&nbsp; {

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; log(2, "transaction was cancelled by UAC\n");

        <br>

        &nbsp;&nbsp;&nbsp; }

        <br>

        &nbsp;&nbsp;&nbsp; xlog("(lab1) - In FailureRoute:

        branches=$(branch(uri)[*])\n");

        <br>

        &nbsp;&nbsp;&nbsp; if ( isflagset(1) )

        <br>

        &nbsp;&nbsp;&nbsp; {

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; log(2,"(lab1) - 3++ Received, attempting serial

        fork!\n");

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; next_branches();

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; switch ( $retcode )

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; case 1:

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; log(2,"(lab1) - More branches left, rollOver

        timer set.");

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $avp(s:timerC) = 12;

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; setflag(1);&nbsp; # Do I need this? Should I use

        branchflags instead?

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; break;

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; case 2:

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; log(2,"(lab1) - Last branch, timerC set to 60

        sec");

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $avp(s:timerC) = 60;

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; break;

        <br>

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; default:

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; log(2,"(lab1) - No more serial fork

        targets.");

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; exit;

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if ( !t_relay() )

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; log(2,"(lab1) - Error during relay for serial

        fork!\n");

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }

        <br>

        &nbsp;&nbsp;&nbsp; }

        <br>

        &nbsp;&nbsp;&nbsp; else

        <br>

        &nbsp;&nbsp;&nbsp; {

        <br>

        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; log(2,"(lab1) - 3++ result. Serialforking not

        available.\n");

        <br>

        &nbsp;&nbsp;&nbsp; }

        <br>

        <br>

        }

        <br>

        <br>

        When I say that it seems related to another issue I commented on

        a long time

        <br>

        ago, I am referring to the general handling of Timer C, which

        doesn't seem to

        <br>

        be a separate timer, but is reusing the timerB.

        <br>

        <br>

        When the timer pops after the normal 180 seconds, the TM layer

        will *instantly*

        <br>

        generate a 408 response on the server txn, while at the same

        time generating

        <br>

        the CANCEL attempting to terminate the client txn.

        <br>

        To me, this is wrong, but maybe I am suppose to handle this in

        the failure_route?

        <br>

        <br>

        What I would expect is that the CANCEL will cause a 487 response

        from the UAS,

        <br>

        and this will be the final response sent to the UAC.

        <br>

        Also by behaving this way, we may cause a protocol violation

        even though the risk

        <br>

        is small.

        <br>

        <br>

        Once timer C pops we send the CANCEL hoping that it will cause a

        487. BUT, it is

        <br>

        quite possible that before the cancel is received by the UAS, it

        sends a 200 to

        <br>

        the INVITE! Even IF the CANCEL receives a 2xx response, we may

        still get a 2xx

        <br>

        response to the INVITE.

        <br>

        But with the current behavior of opensips, this would cause

        opensips to proxy

        <br>

        TWO final responses on the server txn, once being the initial

        408 sent by the

        <br>

        txn on timer C timeout, and then the real/actual 2xx sent by the

        uas.

        <br>

        <br>

        I've also seen a similar problem with 6xx responses received on

        a branch during

        <br>

        forking.

        <br>

        Opensips forwards the 6xx *before* the remaining client txns has

        completed, and

        <br>

        there is no guarantee that these client txns will all terminate

        with 487 even

        <br>

        if opensips tries to CANCEL all of them asap.

        <br>

        They may still return 2xx to the invite, which would cause a

        forwarding of both

        <br>

        a 6xx and a 2xx on the server txn. This scenario is even

        mentioned in rfc3261.

        <br>

        <br>

        So all these three problems have in common that the server txn

        seems to be

        <br>

        terminating a bit early, before the client side has fully

        completed, but as

        <br>

        I said, it might at least partially be something I should handle

        in my

        <br>

        failure_routes...?

        <br>

        <br>

        Thanks for all your help.

        <br>

        Regards

        <br>

        Taisto Qvist

        <br>

        <br>

        <br>

        Bogdan-Andrei Iancu skrev 2010-10-06 17:04:

        <br>

        <blockquote type="cite">Hi Taisto,

          <br>

          <br>

          could you test the rev 7248 on trunk for solution 2) ? if ok,

          I will backport to 1.6

          <br>

          <br>

          Regards,

          <br>

          Bogdan

          <br>

        </blockquote>

      </blockquote>

      <br>

      <br>

    </blockquote>

  </body>

</html>