[OpenSIPS-Devel] Server trying outgoing TCP (TLS) connection, gets blocked

Tue Sep 22 11:02:58 CEST 2009

Hi Ivan,

Ivan Genov wrote:
> Hi,
>  
> We have noticed that the server tries to connect to user's contact IP:port when there is no existing TCP connection with the client. This can happen when the client-server TLS connection brakes for some reason (for example flaky Internet). After that when consequent requests need to be proxied to that user (NOTIFYs, SUBSCRIBEs) we can see in the logs that the server tries to connect to the user's IP and port because there is not existing TCP (TLS) connection with it. In our setup though the cients are behind NAT and the serveer is in the public Internet.
>   
if a client is behind a NAT and the TCP connection is down (for wathever 
reasons), the server has no ways to open the TCP connection again. Only 
the client is the one able to do it.
>  
> In most such cases the server fails "normally" after 10 seconds and during that time if the same client tries to REGISTER, the REGISTER packets are processed on the server by the same process that has been blocked for 10 seconds, after the blocking 10 secs have elapsed. In effect this makes it harder for the clent to re-REGISTER when the connection has been broken.
>   
You mean all the TCP worker processes where blocked in trying to open 
TCP connection to the client behind the NAT, so there was no processes 
left to handle the incoming TCP traffic ?

BTW, have you tried to :
    1) reduce the tcp_connect_timeout (see 
http://www.opensips.org/Resources/DocsCoreFcn#toc70)
    2) increase the number of TCP working processes via tcp_children 
(see http://www.opensips.org/Resources/DocsCoreFcn#toc67)
>  
> There are cases though, when we can see in the server logs that there are no logs for more than 3 minutes and we can see that two server processes have been trying outgoing connection to the old client's IP:port for more than 3 minutes and we can see how both processes report unsuccessfull tcp blocking connect to the client after the logs resume after more than 3 minutes have elapsed. During such cases the server simply doesn't respond to client's attempts to REGISTER and is in effect blocked.
>  
> We have no clue as to why the server is blocked for more than 3 minutes since the tcp connect timeout seems to be just 10 seconds.
>   
have you tried to use the gdb to attached to the TCP processes to see 
where they are blocked ?
>  
> What we think is best is if we can configure the server to not try outgoing TCP connections to clients (when TCP conenction doesn't exist)? Is there a configration setting for that? If there is no configuration for that, what is the best place in code to make this change in the code? If this is not easy or not recommended, can we set the server's timeout for outgoing TCP connection to something smaller, for example 1-2 seconds, or even 0 seconds? We feel in our setup it will be better if the server does not try to connect at all, becasue anyway the connection attempt will fail.
>   
that will be an idea, but the problem is to pass the TCP stack (in 
opensips) the information if it is allowed or not to open new TCP 
connection. As this information can be determined only from script, 
based on the routing logic.

Also, ideally such behaviour should be done automatically and 
transparent for the script writer - the script should not have different 
handling for TCP and UDP....IMHO.

Regards,
Bogdan