[OpenSIPS-Users] crashing in 2.2.2

Mon Mar 6 07:14:50 EST 2017

Hi<

I've tested this on the latest 2.2.3 with the same results.

http://pastebin.com/Uixb3v8G

there were a few of these in the logsd too just before the crash:
Mar  5 22:02:27 gl-sip-03 /usr/sbin/opensips[29875]: 
WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled 
for 204079170 ms (now 204079270 ms), it may overlap..
Mar  5 22:02:27 gl-sip-03 /usr/sbin/opensips[29875]: 
WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled 
for 204079170 ms (now 204079360 ms), it may overlap..
Mar  5 22:02:27 gl-sip-03 /usr/sbin/opensips[29875]: 
WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled 
for 204079170 ms (now 204079460 ms), it may overlap..
Mar  5 22:02:27 gl-sip-03 /usr/sbin/opensips[29875]: 
WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled 
for 204079170 ms (now 204079560 ms), it may overlap..
Mar  5 22:02:27 gl-sip-03 /usr/sbin/opensips[29875]: 
WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled 
for 204079170 ms (now 204079660 ms), it may overlap..
Mar  5 22:02:28 gl-sip-03 /usr/sbin/opensips[29875]: 
WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled 
for 204079170 ms (now 204079760 ms), it may overlap..

Regards,

Richard

On 03/03/2017 13:15, Richard Robson wrote:
> More cores
>
> http://pastebin.com/MXW2VBhi
> http://pastebin.com/T7JFAP2U
> http://pastebin.com/u44aaVpWquit
> http://pastebin.com/SFKKcGxE
> http://pastebin.com/dwSgMsJi
> http://pastebin.com/9HdGLm96
>
> I've put 2.2.3 on the dev box now and will try to replicate on that 
> box, but its difficult to replicate the traffic artificially. I'll try 
> to replicate the fault on the dev box over the weekend. I cant do it 
> on the live gateways because it will affect customer traffic.
>
> Regards,
>
> Richard
>
>
> On 03/03/2017 11:28, Richard Robson wrote:
>> I've revisited the gateway failover mechanism I had developed in 
>> order to re route calls to the next gateway on 500's due to capacity 
>> on the gateways we are using.
>>
>> we have 3 gateways from one carrier and one from another. The 3 have 
>> 4 cps and will return a 503 or 500 if we breach this. The single 
>> gateway from the other carrier has plenty of capacity and should not 
>> be a problem so we want to catch this . and route to the next gateway.
>>
>> We are counting the CPS and channel limits and are routing to the 
>> next gateway if we exceed the limit set, but There are still 
>> occasions where a 5XX is generated, which results in a rejected call.
>>
>>
>> We want to stop these rejected calls and therefore want to implement 
>> the failover mechanism for the 5XX responses. For 6 months we have 
>> been failing over if we think the counts are to high on any one 
>> gateway without a problem. But when I implement a failover on a 5XX 
>> response opensips starts crashing.
>>
>> It's difficult to generate artificial traffic to mimic the real 
>> traffic, but I've not had a problem with the script in testing. Last 
>> night I rolled out the new script but by 09:15 this morning opensips 
>> started crashing 10 times in 5 minutes. This was as the traffic 
>> ramped up. I rolled back the script and it restarted OK and has not 
>> crashed since. Therefore the Failover Mechanism in the script is 
>> where the crash is happening
>>
>> Core dump: http://pastebin.com/CqnESCm4
>>
>> I'll add more dumps later
>>
>> Regards,
>>
>> Richard
>>
>>
>> this is the failure route catching the 5XX
>>
>> failure_route[dr_fo] {
>>         xlog (" [dr]  Recieved reply to method $rm From: $fd, $fn, 
>> $ft, $fu, $fU, $si, $sp, To: $ru");
>>         if (t_was_cancelled()) {
>>                 xlog("[dr]call cancelled by internal caller");
>>                 rtpengine_manage();
>>                 do_accounting("db", "cdr|missed");
>>                 exit;
>>         }
>>
>>         if ( t_check_status("[54]03")) {
>>                 route(relay_failover);
>>         }
>>         if ( t_check_status("500")) {
>>                 route(relay_failover);
>>         }
>>
>>         do_accounting("db", "cdr|missed");
>>         rtpengine_manage();
>>         exit;
>> }
>>
>> This is the route taken on the failure
>>
>>
>> route[relay_failover]{
>>
>>         if (use_next_gw()) {
>>                 xlog("[relay_failover-route] Selected Gateway is $rd");
>> $avp(trunkratelimit)=$(avp(attrs){s.select,0,:});
>> $avp(trunkchannellimit)=$(avp(attrs){s.select,1,:});
>>
>>                 ####### check channel limit ######
>>                 get_profile_size("outbound","$rd","$var(size)");
>>                 xlog("[relay_failover-route] Selected Gateway is $rd 
>> var(size) = $var(size)");
>>                 xlog("[relay_failover-route] Selected Gateway is $rd 
>> avp(trunkcalllimit) = $avp(trunkchannellimit)");
>>                 xlog("[relay_failover-route] Selected Gateway is $rd  
>> result = ( $var(size) > $avp(trunkchannellimit))");
>>                 if ( $(var(size){s.int}) > 
>> $(avp(trunkchannellimit){s.int})) {
>>                         xlog("[relay_failover-route] Trunk $rd 
>> exceeded $avp(trunkchannellimit) concurrent calls $var(size)");
>>                         route(relay_failover);
>>                 }
>>         } else {
>>                send_reply("503", "Gateways Exhusted");
>>                exit;
>>         }
>>
>>         ##### We need to check Rate Limiting #######
>>         if (!rl_check("$rd", "$(avp(trunkratelimit){s.int})", 
>> "TAILDROP")) { # Check Rate limit $avp needs changing
>>                 rl_dec_count("$rd"); # decrement the counter since 
>> we've not "used" one
>>                 xlog("[ratelimiter-route] [Max CPS: 
>> $(avp(trunkratelimit){s.int}) Current CPS: $rl_count($rd)] Call to: 
>> $rU from: $fU CPS exceeded, delaying");
>>                 $avp(initial_time)=($Ts*1000)+($Tsm/1000);
>>                 async(usleep("200000"),relay_failover_delay);
>>                 xlog ("Should not get here!!!! after async requst");
>>         } else {
>>                 xlog ("[relay_outbound-route] [Max CPS: 
>> $avp(trunkratelimit) Current CPS: $rl_count($rd)] Call to: $rU from: 
>> $fU not ratelimited");
>>         }
>>
>>         t_on_failure("dr_fo");
>>         do_accounting("db", "cdr|missed");
>>         rtpengine_manage();
>>         if (!t_relay()) {
>>                         xlog("[relay-route] ERROR: Unable to relay");
>>                         send_reply("500","Internal Error");
>>                         exit;
>>         }
>> }
>>
>>
>>
>>
>
>

-- 
Richard Robson
Greenlight Support
01382 843843
support at greenlightcrm.com