[OpenSIPS-Devel] SF.net SVN: opensips:[5847] trunk/modules/nat_traversal/nat_traversal.c

Dan Pascu dan at ag-projects.com
Wed Jul 15 01:07:24 CEST 2009


On 15 Jul 2009, at 01:32, Thomas Gelf wrote:

> Dan Pascu wrote:
>> Just to get an idea why the case Thomas gets is so unexpected,  
>> contact-
>>> uri is built using this code:
>>
>> static char*
>> get_source_uri(struct sip_msg *msg)
>> {
>>     static char uri[64];
>>     snprintf(uri, 64, "sip:%s:%d", ip_addr2a(&msg->rcv.src_ip), msg-
>>> rcv.src_port);
>>     return uri;
>> }
>>
>> and then duplicated in shared memory. There is no way for contact- 
>> >uri
>> to end up NULL or not to contain the IP and port, no matter what
>> actions the user does in the script.
>
> Is this also true if I'm doing AVP_RECEIVED = $source_uri ?
> (that's what my config looks like)
>

Yes. As I said, it doesn't matter what you do in script. The vales are  
read from some internal opensips structures that reflect some kernel  
structures that contain the source IP/port and destination IP/port.  
There is no way in which those are affected by script actions, nor can  
they not be present. This is what makes me believe that it's not a  
problem in the nat_traversal code, but some form of memory corruption.

>> Right now I suspect that Thomas suffers from some sort of memory
>> corruption that happens to affect the nat_traversal module internal
>> data somehow.
>>
>> Thomas, can you please compile opensips to use the system malloc
>> instead of pkg_malloc and see if the problem persists? I had suffered
>> similar weird memory corruption issues in the past, that could not be
>> identified but were cured by using the system malloc. In my case the
>> segfaults happened in t_relay or sl_send_reply, but the memory was
>> similarly corrupted in unexpected places.
>
> I did so - or better, I tried my best to do so. Changes in revision
> 5653 didn't allow me to compile with system malloc. At least that's
> my assumption. As I never wrote a C program that's just a wild guess.
> After reverting some changes (r5653-5655) and disabling STATISTICS
> I have finally been able to compile without PKG_MALLOC (see other
> thread).

I do not know about that as I've never tried opensips-1.5 or newer.  
However here is a patch that I use (debian dpatch, but it can be used  
as a standard patch), for disabling pkg_malloc and using system malloc  
instead. (see attachement)

>
> I do not really like the idea to reproduce the nat_helper crash, as
> I have (very) few customers already using this proxy. And I really
> have no idea how to do so. It was a really strange effect - after
> a restart it kept crashing and crashing. After a while it appeared
> to be stable again - but after the next call (not sure if it was
> really the next one) - it crashed and crashed once again. Usually
> it did so shortly after a new dialog started (at least it seemed
> to be so). If you'd like to have a look at the core files I could
> try to find the corresponding binary.
>

For the moment it's enough if you post the output of bt full in gdb.

> However, I discovered other ways to crash OpenSIPS - and they still
> work even with my somehow-fiddled-system-malloc-version. I'll send
> another post with related information.

Are you saying that with the system malloc you do not see the  
net_traversal related crashes anymore?
That you now only see crashes related to other parts of the code?

>
> There is one thing I got aware of: shortly after being started it's
> easy to produce crashes - if running without being disturbed for a
> while chances are good that it will keep running. I know that this
> is not a good diagnose - but that's how it seemed to behave. OpenSIPS
> probably needs a lot of love and care ;-)


You are the first one to report such an issue in nat_traversal. If it  
would be a bug in the nat_traversal code, it would become obvious on  
inspection (especially after a backtrace) and many more people would  
be affected in a systematic manner. I have it running on dozens of  
system without any problem. The scarce and random nature of it makes  
me believe it's something else related to memory corruption. I've seen  
similarly strange issues in the past with unexpected crashes in parts  
of code that didn't have any obvious problems where the backtrace  
reported them, which were magically cured by using the system malloc.  
As in your case, nobody else experienced the crashes I did experience,  
so I believe it's a memory corruption issue that is triggered by a  
combination of factors that is unique to different installations,  
resulting in crashes in different areas of the code, without having  
programming problems in those particular areas of the code, only being  
affected as a side effect of the memory being corrupted.

Can you tell me if you see the issue anymore after switching to the  
system malloc?


--
Dan


-------------- next part --------------
A non-text attachment was scrubbed...
Name: 12_use_system_malloc.dpatch
Type: application/octet-stream
Size: 1917 bytes
Desc: not available
Url : http://lists.opensips.org/pipermail/devel/attachments/20090715/65bec244/attachment.obj 


More information about the Devel mailing list