[OpenSIPS-Devel] OpenSIPS Crash

Ben Newlin Ben.Newlin at genesys.com
Thu Jul 11 16:15:00 EDT 2019


Hi,

We experienced an issue today in our testing environment where OpenSIPS was crashing pretty much constantly. We are investigating some changes we made to our TLS config, but I wanted to also provide some of the backtraces here. Due to the sheer number of core dumps (over 360) I cannot provide them all, so I tried to take a relative sampling. I expected the backtraces to all be similar but they do appear to be substantially different, although on closer inspection they may be the same cause. It’s also not clear to me whether any of these crashes are the same as the ones I have already reported.

Server 1:
https://pastebin.com/5gmfd0KU

Server 2:
https://pastebin.com/r8vNdA8W

Server 3:
https://pastebin.com/cyg9F4Z5
https://pastebin.com/r4fvLA91

Server 4:
https://pastebin.com/V5MWrUWh
https://pastebin.com/FShUNc6z


Ben Newlin

From: Devel <devel-bounces at lists.opensips.org> on behalf of Ben Newlin <Ben.Newlin at genesys.com>
Reply-To: OpenSIPS devel mailling list <devel at lists.opensips.org>
Date: Tuesday, July 9, 2019 at 5:20 PM
To: OpenSIPS devel mailling list <devel at lists.opensips.org>, Bogdan-Andrei Iancu <bogdan at opensips.org>
Subject: Re: [OpenSIPS-Devel] OpenSIPS Crash

Any updates on this? Would you like me to open a ticket or does one already exist for the issue?

Ben Newlin

From: Devel <devel-bounces at lists.opensips.org> on behalf of Ben Newlin <Ben.Newlin at genesys.com>
Reply-To: OpenSIPS devel mailling list <devel at lists.opensips.org>
Date: Wednesday, June 26, 2019 at 11:58 AM
To: Bogdan-Andrei Iancu <bogdan at opensips.org>, OpenSIPS devel mailling list <devel at lists.opensips.org>
Subject: Re: [OpenSIPS-Devel] OpenSIPS Crash

Bogdan,

I am glad the cause is known at least. :)

Is there a ticket I can follow for more information and to be notified of resolution?

Ben Newlin

From: Bogdan-Andrei Iancu <bogdan at opensips.org>
Date: Wednesday, June 26, 2019 at 7:37 AM
To: Ben Newlin <Ben.Newlin at genesys.com>, OpenSIPS devel mailling list <devel at lists.opensips.org>
Subject: Re: [OpenSIPS-Devel] OpenSIPS Crash

Thank you Ben,

This backtrace confirms the initial suspicion - I'm trying to find out a way to get it fixed in a more generic way; what you experience is just one face of the problem (with many faces :P) and there are other related reports.

Regards,




Bogdan-Andrei Iancu



OpenSIPS Founder and Developer

  https://www.opensips-solutions.com

OpenSIPS Summit 2019

  https://www.opensips.org/events/Summit-2019Amsterdam/
On 06/26/2019 02:34 AM, Ben Newlin wrote:
Bogdan,

I have now been able to reproduce this crash reliably, I believe. Please see the backtrace [1]. Is there some other information you would like me to collect when this occurs?

[1] https://pastebin.com/n0Ph8XH5

Ben Newlin

From: Bogdan-Andrei Iancu <bogdan at opensips.org><mailto:bogdan at opensips.org>
Date: Friday, June 7, 2019 at 9:15 AM
To: Ben Newlin <Ben.Newlin at genesys.com><mailto:Ben.Newlin at genesys.com>, OpenSIPS devel mailling list <devel at lists.opensips.org><mailto:devel at lists.opensips.org>
Subject: Re: [OpenSIPS-Devel] OpenSIPS Crash

Hi Ben,

How often/easy is to reproduce this crash (if possible) ? Brainstorming with Razvan, we suspect a race (on the msg save in shmem in transaction) between the process doing the cleanup after the async resume and the process running the failure route (due th 503).

But this is just a supposition, eventually you can validate it or not by removing the async ??

And on the double ACK - I'm not 100% it is a actually a double one, as the second has a smaller MF value (69, versus the 70 on the first ACK).

Regards,





Bogdan-Andrei Iancu



OpenSIPS Founder and Developer

  https://www.opensips-solutions.com

OpenSIPS Summit 2019

  https://www.opensips.org/events/Summit-2019Amsterdam/
On 06/07/2019 03:52 PM, Ben Newlin wrote:
Bogdan,

Sorry, I should have thought to actually look at the trace and examine this call.

1) Yes
2) The Called Party is 10.32.20.60, which is another OpenSIPS instance. The crashed instance received the "503 Service Unavailable" approximately 8-10 ms after sending the INVITE.

There is a SIP trace of the exchange here: https://pastebin.com/6bttsSVD.

One oddity I saw is that the crashed process appears to send (or at least siptrace) the ACK twice.

Ben Newlin

From: Bogdan-Andrei Iancu <bogdan at opensips.org><mailto:bogdan at opensips.org>
Date: Thursday, June 6, 2019 at 11:42 AM
To: OpenSIPS devel mailling list <devel at lists.opensips.org><mailto:devel at lists.opensips.org>, Ben Newlin <Ben.Newlin at genesys.com><mailto:Ben.Newlin at genesys.com>
Subject: Re: [OpenSIPS-Devel] OpenSIPS Crash

Hi Ben,

Thanks for "another" report :).

Questions:
1) do you do any async for the INVITE in this crash ?
2) if it is an YES to (1), is the caller party generating the "503 Service Unavailable" (which triggers the crash) - 10.32.20.60 ?? - a really close (from net delay perspective) and fast to answer party ?

Regards,






Bogdan-Andrei Iancu



OpenSIPS Founder and Developer

  https://www.opensips-solutions.com

OpenSIPS Summit 2019

  https://www.opensips.org/events/Summit-2019Amsterdam/
On 06/05/2019 10:02 PM, Ben Newlin wrote:
We have had another crash today.

Backtrace is here: https://pastebin.com/q4RQC7kS

I found this in the log at the time of the crash:

Jun  5 17:54:10 [4978] CRITICAL:core:sig_usr: segfault in process pid: 4978, id: 8


Please let me know if any further information can be useful.

Ben Newlin

From: Devel <devel-bounces at lists.opensips.org><mailto:devel-bounces at lists.opensips.org> on behalf of Ben Newlin <Ben.Newlin at genesys.com><mailto:Ben.Newlin at genesys.com>
Reply-To: OpenSIPS devel mailling list <devel at lists.opensips.org><mailto:devel at lists.opensips.org>
Date: Friday, May 10, 2019 at 6:31 PM
To: OpenSIPS devel mailling list <devel at lists.opensips.org><mailto:devel at lists.opensips.org>
Subject: Re: [OpenSIPS-Devel] OpenSIPS Crash

I found this in the log at the time of the crash:

kernel: opensips[5003]: segfault at 30 ip 00007fbd4c8f59d0 sp 00007ffcaa850c80 error 6 in tm.so[7fbd4c887000+8e000]

Ben Newlin

From: Devel <devel-bounces at lists.opensips.org><mailto:devel-bounces at lists.opensips.org> on behalf of Ben Newlin <Ben.Newlin at genesys.com><mailto:Ben.Newlin at genesys.com>
Reply-To: OpenSIPS devel mailling list <devel at lists.opensips.org><mailto:devel at lists.opensips.org>
Date: Friday, May 10, 2019 at 5:44 PM
To: OpenSIPS devel mailling list <devel at lists.opensips.org><mailto:devel at lists.opensips.org>
Subject: [OpenSIPS-Devel] OpenSIPS Crash

Hello,

We had a crash today of our OpenSIPS instance.

Backtrace is here: https://pastebin.com/QbRJimwx

# opensips -V
version: opensips 2.4.5 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, F_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: d025b4f61
main.c compiled on 20:58:31 May  9 2019 with gcc 7

Ben Newlin








_______________________________________________

Devel mailing list

Devel at lists.opensips.org<mailto:Devel at lists.opensips.org>

http://lists.opensips.org/cgi-bin/mailman/listinfo/devel















-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/devel/attachments/20190711/73323d7a/attachment-0001.html>


More information about the Devel mailing list