[OpenSIPS-Users] crashing in 2.2.2
Richard Robson
rrobson at greenlightcrm.com
Tue Mar 7 05:28:51 EST 2017
Hi,
I've gone over the script and as far as I can see its working as
expected until the traffic remps up and then opensips crashes.
cores:
http://pastebin.com/CgN0h40K
http://pastebin.com/ay5TS8zD
http://pastebin.com/PGn3AqmU
Regards,
Richard
On 06/03/2017 12:14, Richard Robson wrote:
> Hi<
>
> I've tested this on the latest 2.2.3 with the same results.
>
> http://pastebin.com/Uixb3v8G
>
> there were a few of these in the logsd too just before the crash:
> Mar 5 22:02:27 gl-sip-03 /usr/sbin/opensips[29875]:
> WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled
> for 204079170 ms (now 204079270 ms), it may overlap..
> Mar 5 22:02:27 gl-sip-03 /usr/sbin/opensips[29875]:
> WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled
> for 204079170 ms (now 204079360 ms), it may overlap..
> Mar 5 22:02:27 gl-sip-03 /usr/sbin/opensips[29875]:
> WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled
> for 204079170 ms (now 204079460 ms), it may overlap..
> Mar 5 22:02:27 gl-sip-03 /usr/sbin/opensips[29875]:
> WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled
> for 204079170 ms (now 204079560 ms), it may overlap..
> Mar 5 22:02:27 gl-sip-03 /usr/sbin/opensips[29875]:
> WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled
> for 204079170 ms (now 204079660 ms), it may overlap..
> Mar 5 22:02:28 gl-sip-03 /usr/sbin/opensips[29875]:
> WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled
> for 204079170 ms (now 204079760 ms), it may overlap..
>
>
> Regards,
>
> Richard
>
>
>
> On 03/03/2017 13:15, Richard Robson wrote:
>> More cores
>>
>> http://pastebin.com/MXW2VBhi
>> http://pastebin.com/T7JFAP2U
>> http://pastebin.com/u44aaVpWquit
>> http://pastebin.com/SFKKcGxE
>> http://pastebin.com/dwSgMsJi
>> http://pastebin.com/9HdGLm96
>>
>> I've put 2.2.3 on the dev box now and will try to replicate on that
>> box, but its difficult to replicate the traffic artificially. I'll
>> try to replicate the fault on the dev box over the weekend. I cant do
>> it on the live gateways because it will affect customer traffic.
>>
>> Regards,
>>
>> Richard
>>
>>
>> On 03/03/2017 11:28, Richard Robson wrote:
>>> I've revisited the gateway failover mechanism I had developed in
>>> order to re route calls to the next gateway on 500's due to capacity
>>> on the gateways we are using.
>>>
>>> we have 3 gateways from one carrier and one from another. The 3 have
>>> 4 cps and will return a 503 or 500 if we breach this. The single
>>> gateway from the other carrier has plenty of capacity and should not
>>> be a problem so we want to catch this . and route to the next gateway.
>>>
>>> We are counting the CPS and channel limits and are routing to the
>>> next gateway if we exceed the limit set, but There are still
>>> occasions where a 5XX is generated, which results in a rejected call.
>>>
>>>
>>> We want to stop these rejected calls and therefore want to implement
>>> the failover mechanism for the 5XX responses. For 6 months we have
>>> been failing over if we think the counts are to high on any one
>>> gateway without a problem. But when I implement a failover on a 5XX
>>> response opensips starts crashing.
>>>
>>> It's difficult to generate artificial traffic to mimic the real
>>> traffic, but I've not had a problem with the script in testing. Last
>>> night I rolled out the new script but by 09:15 this morning opensips
>>> started crashing 10 times in 5 minutes. This was as the traffic
>>> ramped up. I rolled back the script and it restarted OK and has not
>>> crashed since. Therefore the Failover Mechanism in the script is
>>> where the crash is happening
>>>
>>> Core dump: http://pastebin.com/CqnESCm4
>>>
>>> I'll add more dumps later
>>>
>>> Regards,
>>>
>>> Richard
>>>
>>>
>>> this is the failure route catching the 5XX
>>>
>>> failure_route[dr_fo] {
>>> xlog (" [dr] Recieved reply to method $rm From: $fd, $fn,
>>> $ft, $fu, $fU, $si, $sp, To: $ru");
>>> if (t_was_cancelled()) {
>>> xlog("[dr]call cancelled by internal caller");
>>> rtpengine_manage();
>>> do_accounting("db", "cdr|missed");
>>> exit;
>>> }
>>>
>>> if ( t_check_status("[54]03")) {
>>> route(relay_failover);
>>> }
>>> if ( t_check_status("500")) {
>>> route(relay_failover);
>>> }
>>>
>>> do_accounting("db", "cdr|missed");
>>> rtpengine_manage();
>>> exit;
>>> }
>>>
>>> This is the route taken on the failure
>>>
>>>
>>> route[relay_failover]{
>>>
>>> if (use_next_gw()) {
>>> xlog("[relay_failover-route] Selected Gateway is $rd");
>>> $avp(trunkratelimit)=$(avp(attrs){s.select,0,:});
>>> $avp(trunkchannellimit)=$(avp(attrs){s.select,1,:});
>>>
>>> ####### check channel limit ######
>>> get_profile_size("outbound","$rd","$var(size)");
>>> xlog("[relay_failover-route] Selected Gateway is $rd
>>> var(size) = $var(size)");
>>> xlog("[relay_failover-route] Selected Gateway is $rd
>>> avp(trunkcalllimit) = $avp(trunkchannellimit)");
>>> xlog("[relay_failover-route] Selected Gateway is
>>> $rd result = ( $var(size) > $avp(trunkchannellimit))");
>>> if ( $(var(size){s.int}) >
>>> $(avp(trunkchannellimit){s.int})) {
>>> xlog("[relay_failover-route] Trunk $rd
>>> exceeded $avp(trunkchannellimit) concurrent calls $var(size)");
>>> route(relay_failover);
>>> }
>>> } else {
>>> send_reply("503", "Gateways Exhusted");
>>> exit;
>>> }
>>>
>>> ##### We need to check Rate Limiting #######
>>> if (!rl_check("$rd", "$(avp(trunkratelimit){s.int})",
>>> "TAILDROP")) { # Check Rate limit $avp needs changing
>>> rl_dec_count("$rd"); # decrement the counter since
>>> we've not "used" one
>>> xlog("[ratelimiter-route] [Max CPS:
>>> $(avp(trunkratelimit){s.int}) Current CPS: $rl_count($rd)] Call to:
>>> $rU from: $fU CPS exceeded, delaying");
>>> $avp(initial_time)=($Ts*1000)+($Tsm/1000);
>>> async(usleep("200000"),relay_failover_delay);
>>> xlog ("Should not get here!!!! after async requst");
>>> } else {
>>> xlog ("[relay_outbound-route] [Max CPS:
>>> $avp(trunkratelimit) Current CPS: $rl_count($rd)] Call to: $rU from:
>>> $fU not ratelimited");
>>> }
>>>
>>> t_on_failure("dr_fo");
>>> do_accounting("db", "cdr|missed");
>>> rtpengine_manage();
>>> if (!t_relay()) {
>>> xlog("[relay-route] ERROR: Unable to relay");
>>> send_reply("500","Internal Error");
>>> exit;
>>> }
>>> }
>>>
>>>
>>>
>>>
>>
>>
>
>
--
Richard Robson
Greenlight Support
01382 843843
support at greenlightcrm.com
More information about the Users
mailing list