[OpenSIPS-Users] OpenSIPS fix_route_dialog crashes
Bogdan-Andrei Iancu
bogdan at opensips.org
Tue Aug 2 09:47:31 CEST 2016
Ben,
To make it easier, please send me the instructions on how to reproduce
the crash.
Thanks and Regards,
Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
http://www.opensips-solutions.com
On 01.08.2016 20:17, Newlin, Ben wrote:
>
> Bogdan,
>
> I am not familiar with gdb, so I double check what you’ve assessed. If
> there are some other steps with gdb you would like me to perform, just
> let me know what to do.
>
> Is there a way to compile the memory debugger without using the
> interactive `make menuconfig` command? Our build system is completely
> automated, so it impossible for me to do it this way. Can I pass the
> options as build parameters or alter the makefile in some way?
>
> I can provide a SIPp scenario which should reproduce the issue on any
> basic script that uses Dialog with topology hiding, if that would be
> easier.
>
> Ben Newlin
>
> *From: *Bogdan-Andrei Iancu <bogdan at opensips.org>
> *Date: *Monday, August 1, 2016 at 10:57 AM
> *To: *OpenSIPS users mailling list <users at lists.opensips.org>,
> "Newlin, Ben" <Ben.Newlin at inin.com>
> *Subject: *Re: [OpenSIPS-Users] OpenSIPS fix_route_dialog crashes
>
> Hi Ben,
>
> According to the BT, the crash is in a pkg_malloc() call:
> route = pkg_malloc(size);
> Please double check this with gdb info.
>
> If so, this indicate a memory corruption and we have 2 options here:
> - you compile with memory debugger (see my previous emails)
> - provide step-by-step indications on how to reproduce this crash.
>
> Thanks and Regards,
>
> Bogdan-Andrei Iancu
> OpenSIPS Founder and Developer
> http://www.opensips-solutions.com
>
> On 29.07.2016 15:54, Newlin, Ben wrote:
>
> This is 1.11.6, running on CentOS 7.
>
> Ben Newlin
>
> *From: *<users-bounces at lists.opensips.org>
> <mailto:users-bounces at lists.opensips.org> on behalf of
> Bogdan-Andrei Iancu <bogdan at opensips.org> <mailto:bogdan at opensips.org>
> *Reply-To: *OpenSIPS users mailling list
> <users at lists.opensips.org> <mailto:users at lists.opensips.org>
> *Date: *Friday, July 29, 2016 at 8:50 AM
> *To: *"Newlin, Ben" <Ben.Newlin at inin.com>
> <mailto:Ben.Newlin at inin.com>, OpenSIPS users mailling list
> <users at lists.opensips.org> <mailto:users at lists.opensips.org>
> *Subject: *Re: [OpenSIPS-Users] OpenSIPS fix_route_dialog crashes
>
> Ben,
>
> What OpenSIPS version is this (the crashing one) ? 1.11 or 2.1 ?
>
> Regards,
>
>
> Bogdan-Andrei Iancu
>
> OpenSIPS Founder and Developer
>
> http://www.opensips-solutions.com
>
> On 27.07.2016 19:02, Newlin, Ben wrote:
>
> I have identified that these crashes are occurring when the
> far end system is not returning the Record-Route headers in
> the 200 OK response. The headers are present in the 180
> response, but not the 200 OK. I have reproduced the scenario
> using SIPp and captured a SIP trace:
> http://pastebin.com/ckKk3EhY <http://pastebin.com/ckKk3EhY>
>
> The crash occurs on receipt of the ACK request and attempt to
> match the dialog.
>
> I also captured a BT for this scenario as well, in case
> anything specific in the trace made the issue easier to find:
> http://pastebin.com/cM3FhPiw
>
> I am working with the other system to try to fix their behavior.
>
> Ideally the Record-Route headers from previous replies could
> be used in this case to allow the call to succeed, but I don’t
> know if that is possible.
>
> Thanks,
>
> Ben Newlin
>
> *From: *"Newlin, Ben" <Ben.Newlin at inin.com>
> <mailto:Ben.Newlin at inin.com>
> *Date: *Wednesday, July 27, 2016 at 9:44 AM
> *To: *Bogdan-Andrei Iancu <bogdan at opensips.org>
> <mailto:bogdan at opensips.org>, OpenSIPS users mailling list
> <users at lists.opensips.org> <mailto:users at lists.opensips.org>
> *Subject: *Re: [OpenSIPS-Users] OpenSIPS fix_route_dialog crashes
>
> Bogdan,
>
> This is a different scenario than the other you responded to.
> As I said, we have two types of servers that work together.
> One is a load-balancer and runs as a proxy. It uses double
> Record-Route because it sends messages between public and
> private networks. Then we have our other servers using TH
> which receive those requests. We are not using TH and RR on
> the same server (although I would like to).
>
> If validate_dialog() and fix_route_dialog() (and possibly
> loose_route()) should not be called when using TH, I believe
> the documentation should reference that. It states that
> match_dialog() must be used with TH, but does not indicate
> that the other functions should not be used or that the
> functionality won’t work. There is also no documentation of
> the incompatibility between RR and TH.
>
> Either way, I ran a test where I removed all calls to
> loose_route(), validate_dialog(), and fix_route_dialog() from
> my script. The crash still occurred and the BT still pointed
> to fix_route_dialog() function. So it must be getting called
> from within Dialog module somewhere. That BT is here:
> http://pastebin.com/wu2X2Hxh
>
> I collected this BT with loose_route() being called from my
> script, but not validate_dialog() or fix_route_dialog():
> http://pastebin.com/6V7yPaHF
>
> This BT was collected with all three functions being called
> from my script: http://pastebin.com/fZYYdndn
>
> Ben Newlin
>
> *From: *Bogdan-Andrei Iancu <bogdan at opensips.org>
> <mailto:bogdan at opensips.org>
> *Date: *Wednesday, July 27, 2016 at 3:57 AM
> *To: *OpenSIPS users mailling list <users at lists.opensips.org>
> <mailto:users at lists.opensips.org>, "Newlin, Ben"
> <Ben.Newlin at inin.com> <mailto:Ben.Newlin at inin.com>
> *Subject: *Re: [OpenSIPS-Users] OpenSIPS fix_route_dialog crashes
>
> Hi Ben,
>
> First, if you use TH, makes no sense to do Record-Routing -
> there are 2 SIP concepts that overlaps. You either act as an
> end-point (by doing TH), either as a proxy (doing RR).
>
> If doing TH, makes no sense to use validate + fix as these
> functions check and repair the routing information in the
> request (like Route and Contact headers). if you do TH, this
> routing info is actually hidden and added by OpenSIPS, so
> there is nothing to fix and repair.
>
> Nevertheless, this should not crash or corrupt OpenSIPS. HAve
> you managed to get a corefile ?
>
> Also if you suspect memory corruption, you can compile-in the
> memory debugger - see
> http://www.opensips.org/Documentation/TroubleShooting-OutOfMem .
>
> Regards,
>
>
>
>
> Bogdan-Andrei Iancu
>
> OpenSIPS Founder and Developer
>
> http://www.opensips-solutions.com
>
> On 26.07.2016 23:20, Newlin, Ben wrote:
>
> I have had 3 OpenSIPS server crashes in the last week. All
> were due to segmentation faults. I was not able to capture
> core dumps; I am configuring that now to catch the next crash.
>
> My logs leading up to the crash are full of errors from
> fix_route_dialog() complaining about invalid URIs for
> sequential requests:
>
> Jul 26 19:34:02 [220] ERROR:dialog:fix_route_dialog:
> Failed to parse SIP uri
>
> Jul 26 19:34:02 [220] ERROR:core:parse_uri: bad uri, state
> 0 parsed: <ip:1> (4) /
> <ip:10.18.8.18:5060;ftag=gK0448f137;lr;r2=on>> (44)
>
> Jul 26 19:11:19 [218] ERROR:dialog:fix_route_dialog:
> Failed to parse SIP uri
>
> Jul 26 19:11:19 [218] ERROR:core:parse_uri: bad uri, state
> 0 parsed: <b0i2> (4) /
> <b0i2yjor;transport=udp<sip:10.18.8.17:5060;ftag=7207ce89;lr;r2=on>
> (65)
>
> Jul 26 17:43:19 [220] ERROR:dialog:fix_route_dialog:
> Failed to parse SIP uri
>
> Jul 26 17:43:19 [220] ERROR:core:parse_uri: bad uri, state
> 0 parsed: <ervi> (4) /
> <ervice_id6fdbc70f-2438-4726-807c-0d081df4d87> (44)
>
> Many times the “URI” displayed in the error message is
> actually internal OpenSIPS variables, as in the last error
> above. When they are from the SIP message, I have verified
> that the messages themselves are correctly formatted. This
> leads me to believe there is memory corruption occurring.
>
> This all started when I updated my load-balancer servers
> to use Record-Routing, specifically the “double_rr”
> mechanism for when multiple interfaces exist. The
> Record-Routing is occurring on different servers which
> have not crashed. Only the servers receiving the
> Record-Routed messages are experiencing the errors.
>
> Here is a piece of the code processing sequential
> requests. I am using the topology_hiding() functionality
> of the Dialog module. Are validate_dialog() and
> fix_route_dialog() still valid in a topology_hiding scenario?
>
> if (t_check_trans())
>
> setflag(SEQ_REQUEST);
>
> if (has_totag())
>
> {
>
> loose_route();
>
> if (match_dialog())
>
> {
>
> if (!validate_dialog())
>
> fix_route_dialog();
>
> if (is_method("BYE"))
>
> setflag(ACC_FLAG);
>
> setflag(SEQ_REQUEST);
>
> }
>
> else if (!isflagset(SEQ_REQUEST))
>
> {
>
> if (!is_method("ACK")) {
>
> route(rlog, LV_ERROR, "check_sequential", "Sequential
> request not matched");
>
> route(reply_error, "481", "Call Does Not Exist");
>
> }
>
> return(EXIT);
>
> }
>
> }
>
> I will attempt to get core dumps of future crashes.
>
> Thanks,
>
> Ben Newlin
>
>
>
>
>
>
>
> _______________________________________________
>
> Users mailing list
>
> Users at lists.opensips.org <mailto:Users at lists.opensips.org>
>
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>
>
>
>
> _______________________________________________
>
> Users mailing list
>
> Users at lists.opensips.org <mailto:Users at lists.opensips.org>
>
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20160802/79d6bde3/attachment-0001.htm>
More information about the Users
mailing list