[OpenSIPS-Users] DNS issues

Stanisław Pitucha viraptor at gmail.com
Sun Jul 25 02:16:55 CEST 2010


Hi all,

I wanted to collect some ideas on how do you solve DNS connectivity
problems. I've run into those issues a couple of times already and don't
see a perfect solution so far. Maybe I can trigger some discussion:

Some background:
- opensips blocks the child process while resolving a domain / querying ENUM
- standard resolver has minimum timeout = 1s
- standard resolver does only one query at a time and can cycle
nameservers, but does not save state
I believe these are not real problems - just ugly legacy :) that we can
work around.

The implication is that if you don't use a caching nameserver on your
side and you allow users to use routing based on a domain name (not very
hard - do you handle "302"s, record-routes, registration?), you're
basically screwed:

1. If you don't cache, any domain which times out will block a child for
at least 1s. If you use retries, you block for at least Ns where N =
number of nameservers. You can be DoS-ed with ~8 packets per second, in
standard configuration.

2. If you cycle N nameservers and one of them is down, you're processing
N-1 packets correctly, then block until timeout on the last one, then
processing N-1, etc. - not good for a high-traffic proxy.

3. If you cache results, you're safe from random failures, but only if
you cache timeouts as negative results and keep the state of servers
being down, so you don't try to query them again. (nothing apart from
`dnsmasq` does that, AFAIK)

4. What solves half of the problem for me, is `dnsmasq` - as far as I
know it's the only caching dns server which allows to query all
nameservers in parallel. I get 4 times the needed DNS traffic, but I'm
never timing out connections if one of the servers is down. Also some
results come from cache, so it's only 2 times the traffic in reality.
The problem with `dnsmasq` is that it doesn't cache SRV and NAPTR
requests (doesn't cache the timeouts / NX responses for them either),
only A/AAAA/PTR/....

5. So even if you have a local caching and backup resolver in
`resolv.conf`, minimal timeout, parallel querying from the local cache,
saving the state of upstream resolvers being down and route all internal
traffic via IPs... it takes only one person with custom NAPTR sending
you to custom SRV address which times out to kill all the traffic.

So... what's your experience with this? Do you have some better
protection in place?
I'm considering adding negative caching of dns timeouts and general
caching of SRV and NAPTR records into `dnsmasq` to complete my protection.
Do you know of any software which would solve those problems out-of-box?

Thanks,
Stan



More information about the Users mailing list