[OpenSIPS-Users] DNS issues

Sun Jul 25 10:39:46 CEST 2010

25 jul 2010 kl. 08.53 skrev Adrian Georgescu:

> These are all valid points.  And DNS is not single thing that causes this behavior, any  operation can block like radius, mysql query, and the result is the same.
> 
> The only feasible solution is possible with the new design that will deal asynchronously with such events. The proxy will not wait for the DNS answers in order to proceed with a new transaction.
That's why Resiprocate use the ARES library - for asynch DNS queries.

/O
> 
> Adrian
> 
> On Jul 25, 2010, at 2:16 AM, Stanisław Pitucha wrote:
> 
>> Hi all,
>> 
>> I wanted to collect some ideas on how do you solve DNS connectivity
>> problems. I've run into those issues a couple of times already and don't
>> see a perfect solution so far. Maybe I can trigger some discussion:
>> 
>> Some background:
>> - opensips blocks the child process while resolving a domain / querying ENUM
>> - standard resolver has minimum timeout = 1s
>> - standard resolver does only one query at a time and can cycle
>> nameservers, but does not save state
>> I believe these are not real problems - just ugly legacy :) that we can
>> work around.
>> 
>> The implication is that if you don't use a caching nameserver on your
>> side and you allow users to use routing based on a domain name (not very
>> hard - do you handle "302"s, record-routes, registration?), you're
>> basically screwed:
>> 
>> 1. If you don't cache, any domain which times out will block a child for
>> at least 1s. If you use retries, you block for at least Ns where N =
>> number of nameservers. You can be DoS-ed with ~8 packets per second, in
>> standard configuration.
>> 
>> 2. If you cycle N nameservers and one of them is down, you're processing
>> N-1 packets correctly, then block until timeout on the last one, then
>> processing N-1, etc. - not good for a high-traffic proxy.
>> 
>> 3. If you cache results, you're safe from random failures, but only if
>> you cache timeouts as negative results and keep the state of servers
>> being down, so you don't try to query them again. (nothing apart from
>> `dnsmasq` does that, AFAIK)
>> 
>> 4. What solves half of the problem for me, is `dnsmasq` - as far as I
>> know it's the only caching dns server which allows to query all
>> nameservers in parallel. I get 4 times the needed DNS traffic, but I'm
>> never timing out connections if one of the servers is down. Also some
>> results come from cache, so it's only 2 times the traffic in reality.
>> The problem with `dnsmasq` is that it doesn't cache SRV and NAPTR
>> requests (doesn't cache the timeouts / NX responses for them either),
>> only A/AAAA/PTR/....
>> 
>> 5. So even if you have a local caching and backup resolver in
>> `resolv.conf`, minimal timeout, parallel querying from the local cache,
>> saving the state of upstream resolvers being down and route all internal
>> traffic via IPs... it takes only one person with custom NAPTR sending
>> you to custom SRV address which times out to kill all the traffic.
>> 
>> So... what's your experience with this? Do you have some better
>> protection in place?
>> I'm considering adding negative caching of dns timeouts and general
>> caching of SRV and NAPTR records into `dnsmasq` to complete my protection.
>> Do you know of any software which would solve those problems out-of-box?
>> 
>> Thanks,
>> Stan
>> 
>> _______________________________________________
>> Users mailing list
>> Users at lists.opensips.org
>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>> 
> 
> 
> _______________________________________________
> Users mailing list
> Users at lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users

---
* Olle E Johansson - oej at edvina.net
* Cell phone +46 70 593 68 51, Office +46 8 96 40 20, Sweden