[OpenSIPS-Devel] SF.net SVN: opensips:[5847] trunk/modules/nat_traversal/nat_traversal.c

Dan Pascu dan at ag-projects.com
Thu Jul 9 11:08:48 CEST 2009


On 9 Jul 2009, at 10:45, Bogdan-Andrei Iancu wrote:

> Hi Dan,
>
> I agree that the fix I made was fixing an effect and not the real  
> cause, but this was preventing the crashing until the real cause is  
> found and fixed. The crashing was reported by Thomas Gelf ( he got  
> 4G od core files during a night), so I'm not sure if it safe to  
> remove this fix without having the real fix in place - if not, we  
> will simple expose the users to more crashes.


I disagree. Nobody except Thomas reported this and in more than a year  
since nat_traversal is available nobody reported crashes with it. Thus  
I suspect that is something else in his case that needs to be better  
investigated. If we keep such a workaround in place, the result will  
be that it will not send keepalive messages to the affected endpoints.  
This both hides the segfault cause and generates a new problem. Thomas  
will start reporting that his endpoints are not kept alive anymore  
instead. Besides if the workaround is in place, people will become  
complacent and will not attempt to find the real cause anymore.

Just to get an idea why the case Thomas gets is so unexpected, contact- 
 >uri is built using this code:

static char*
get_source_uri(struct sip_msg *msg)
{
     static char uri[64];
     snprintf(uri, 64, "sip:%s:%d", ip_addr2a(&msg->rcv.src_ip), msg- 
 >rcv.src_port);
     return uri;
}

and then duplicated in shared memory. There is no way for contact->uri  
to end up NULL or not to contain the IP and port, no matter what  
actions the user does in the script.

Right now I suspect that Thomas suffers from some sort of memory  
corruption that happens to affect the nat_traversal module internal  
data somehow.

Thomas, can you please compile opensips to use the system malloc  
instead of pkg_malloc and see if the problem persists? I had suffered  
similar weird memory corruption issues in the past, that could not be  
identified but were cured by using the system malloc. In my case the  
segfaults happened in t_relay or sl_send_reply, but the memory was  
similarly corrupted in unexpected places.

--
Dan




More information about the Devel mailing list