tcp reset or RST problem

Discussion:

Jeffrey Iskandar Ahmad

2002-04-18 09:41:40 UTC

I have install dnscache behind Local directory(load balancer), cisco
equipement. Local Director keep put failed status in few minutes and open
back the connection after one minutes retry on port 53. This thing happen
because there are many TCP reset connections from the dnscache server. Can
anybody help me how to remove this tcp reset error? wierd thing y TCP? I
thought dns only accept UDP.

Jeffrey Iskandar Ahmad
System Engineer
Technology Division
TIME dotNet

Karsten W. Rohrbach

2002-04-18 11:56:50 UTC

Permalink

Post by Jeffrey Iskandar Ahmad
I have install dnscache behind Local directory(load balancer), cisco
equipement. Local Director keep put failed status in few minutes and open
back the connection after one minutes retry on port 53. This thing happen
because there are many TCP reset connections from the dnscache server. Can
anybody help me how to remove this tcp reset error? wierd thing y TCP? I
thought dns only accept UDP.

in the first place, it would be very interesting to hear, why one would
use hardware load balancing equipment in front of a dns server. i cannot
understand your motivation in putting a dns server behind a cisco
localdirector. the more cost-effective solution would be to put a single
small pc server box with no services running except the dns server
directly on the wire and handle outage issues via deployment of multiple
of these comparably cheap machines in different locations, and properly
set up the nameserver entries.

if you appear to run tinydns only (_no_ axfrdns, no dnscache), the RST
responses originate from the ip stack of your server, not a daemon
running on it.

regards,
/k

Post by Jeffrey Iskandar Ahmad
"If you make people think they're thinking, they'll love you; but if you
really make them think they'll hate you."

KR433/KR11-RIPE -- WebMonster Community Founder -- nGENn GmbH Senior Techie
http://www.webmonster.de/ -- ftp://ftp.webmonster.de/ -- http://www.ngenn.net/
GnuPG 0x2964BF46 2001-03-15 42F9 9FFF 50D4 2F38 DBEE DF22 3340 4F4E 2964 BF46
My mail is GnuPG signed -- Unsigned ones are bogus -- http://www.gnupg.org/
Please do not remove my address from To: and Cc: fields in mailing lists. 10x

Jonathan de Boyne Pollard

2002-04-19 09:16:38 UTC

Permalink

JIA> I have install dnscache behind Local directory(load balancer),
JIA> [...]

KWR> if you appear to run tinydns only (_no_ axfrdns, no dnscache),

He said that he was running "dnscache".

What you say about there not being much in the way of good reasons to run a
content DNS server behind a hardware load balancer is true. However, I can
think of one plausible reason for running a _resolving proxy_ DNS server
behind a load balancer (which is what he says he is doing). This is because
whereas for content DNS service one can simply list more content DNS server IP
addresses in the DNS database to spread the load, one doesn't have such a
luxury when it comes to proxy DNS serverice. Many DNS client libraries
support a maximum of only three proxy DNS server IP addresses. Placing
multiple resolving proxy DNS servers behind a hardware load balancer would
allow one to deploy a whole bank of resolving proxy DNS servers whilst only
having to tell one's customers/clients/close friends about a single IP
address.

But, of course, as with what you mentioned, there is an alternative solution
built entirely of PCs here, too. A single forwarding proxy DNS server
listening on that IP address, configured to forward to one's bank of resolving
proxy DNS servers, would also achieve pretty much the same result (albeit that
it would direct the query traffic to randomly selected members of one's bank
of resolving proxy DNS servers, and not by selecting the one that was
incurring the least system load, as one could do with a far more expensive
system).

Rob Mayoff

2002-04-18 15:27:27 UTC

Permalink

This thing happen because there are many TCP reset connections from
the dnscache server.

What query does the LD send to dnscache by TCP?

Richard Letts

2002-04-19 04:00:04 UTC

Permalink

Post by Rob Mayoff
What query does the LD send to dnscache by TCP?

it will either be a SOA or a [IA]XFR as someone tries to transfer a zone
from the machine. since dnscache doesn't listen on TCP the IP stack sends
a RST -- basiclly telling the other end of the connection to go away.

I'd run tcvpserver on thet port just to see where the connections were
coming in from..

Richard

Jonathan de Boyne Pollard

2002-04-19 09:01:42 UTC

Permalink

RL> [...] since dnscache doesn't listen on TCP [...]

"dnscache" _does_ accept TCP connections. Please read the manual page.

RM> What query does the LD send to dnscache by TCP?

RL> it will either be a SOA or a [IA]XFR as someone tries to
RL> transfer a zone [...]

If this is true in Jeffrey's case (He hasn't provided nearly enough
information about what is happening to limit the possibilities to this
alone.), then someone is incorrectly thinking that he is running a content DNS
server on the IP address on which he is running a proxy DNS server. There are
at least three possibilities:

1. This is the IP address for proxy DNS service that Jeffrey tells his
customers about. A dimwitted attacker amongst his customers knows that there
is "a DNS server" at that IP address, and, either not comprehending the
difference between a proxy server and a content server or knowing the
difference and attacking anyway (just on the off-chance that Jeffrey is
running BIND or some other software that vainly tries to wear all of the hats
at once), is attempting to perform a "zone transfer" to obtain the content of
Jeffrey's DNS database in order to refine further attacks.

2. Jeffrey once ran a publically listed content DNS server on this IP
address. He is being surveyed by one of the several content DNS server
surveys that run from time to time. (This seems unlikely, given the reported
frequency of the connection attempts. Most well intentioned surveys
deliberately do not retry frequently.)

3. Jeffrey once ran a publically listed content DNS server on this IP
address. He had an arrangement with someone else to perform DNS database
replication - with his erstwhile content DNS server as the "master" - that he
forgot to cancel, and the other person's proxy DNS servers are still trying to
replicate the database of the content DNS server that they still believe is
running on that IP address.

As I said, Jeffrey hasn't yet come anywhere near to providing enough
information for us to make the deduction that what is happening is a "zone
transfer" replication attempt, let alone limit the possibilities further. It
could well be, for example, that one of his customers is using his proxy DNS
server actually for proxy DNS service, has simply happened to hit one of the
few resource record sets in the world that is so large that "dnscache" returns
a truncated response via UDP, and is thus re-trying the query over TCP in the
usual manner, which Jeffrey has erroneously blocked by a firewall rule.

Rob Mayoff

2002-04-19 12:23:52 UTC

Permalink

Post by Richard Letts

Post by Rob Mayoff
What query does the LD send to dnscache by TCP?

dnscache DOES listen on TCP port 53.

Ahmad said he's running dnscache behind a Cisco LocalDirector. He said
the LD puts the dnscache in "failed status" because there are many TCP
reset connections from the dnscache.

Now, I'm not sure whether he really meant that the LD is receiving TCP
RST packets, or whether he just means that the connections close without
dnscache sending answers.

I do know that a Cisco LocalDirector can be configured
to probe the DNS servers that it is in front of. See
<http://www.cisco.com/warp/public/117/local_director/dns_probe_ld.html>.
I don't know if the LD can be configured to use TCP instead of UDP to
probe the servers. I assumed that it's possible and that Ahmad has an LD
configured that way. I suspected that Ahmad's LD is asking dnscache for
some unresolvable domain, so dnscache takes a long time, eventually
having 20 in-process TCP connections. In that state, if dnscache gets a
21st TCP connection, it will close the oldest connection without sending
a response. I hypothesized that this is what Ahmad was reporting as a
TCP reset connection.

He hasn't posted any followup information. If he still needs help, he
needs to post the commands he used to configure his LD for DNS probing,
and the output of the "syslog console" and "syslog output 23.7 or 20.7"
commands as described on the page I reference above. It would also help
for him to run tcpdump so we can see exactly what's going on between the
LD and dnscache.

Jonathan de Boyne Pollard

2002-04-19 08:39:26 UTC

Permalink

JIA> I thought dns only accept UDP.

Please read the third paragraph of the "Configuration" section of the
"dnscache" manual page. Then provide us with far more detail than you have
done so far.

For instance: What are the source IP addresses of these TCP connections ?
What are their destination IP addresses ? What IP address is "dnscache"
listening on ? What firewall and routing rules are in place on the machine on
which "dnscache" is running ? What were the results when you ran a packet
sniffer on that machine to see whether the TCP connection requests were even
reaching it in the first place ? What queries are reported in the "dnscache"
log ?

Jeffrey Iskandar Ahmad

2002-04-20 02:40:07 UTC

Permalink

I have 2 BIND servers running behind local director sharing same
external ip. Reason using local director is to load balance and
failover. Both servers have same entry of named.conf. Im adding one
dnscache in the group. When i put the server in that group. "TCP reset
reassings" counter keep increasing in the local director status. At
certain limit or threshold it will considered the server as failed and
stop routing packet to that server. After 1 minute retry it will put
the server in service because the server is always up only that the
server the TCP resets keep increasing after few thousand connection. I
have tried close TCP connections at firewall going to dnscache but
still the same.

Does this problem happen because i group the master and dnscache
togather?

Here is what i get from the log file when grep tcp and dont know what
is this.

2002-04-19 17:33:14.482326500 query 3280 202.174.129.5:4421:1401 6
_ldap._tcp.default-first-site-name._sites.dc
._msdcs.wtp.com.my.
2002-04-19 17:33:15.978344500 query 3335 202.174.129.5:4422:1402 6
_tcp.default-first-site-name._sites.dc._msdc
s.wtp.com.my.
2002-04-19 17:33:17.403269500 query 3352 192.168.18.3:2854:1116 6
_kerberos._tcp.irora.com.

Im thinking of blocking TCP but RFC 1123 says that resolvers ``MUST
support UDP, and should support TCP.''

What do u guys suggest?

Karsten W. Rohrbach

2002-04-20 12:10:08 UTC

Permalink

Post by Jeffrey Iskandar Ahmad
I have 2 BIND servers running behind local director sharing same
external ip. Reason using local director is to load balance and
failover. Both servers have same entry of named.conf. Im adding one
dnscache in the group. When i put the server in that group. "TCP reset
reassings" counter keep increasing in the local director status. At
certain limit or threshold it will considered the server as failed and
stop routing packet to that server. After 1 minute retry it will put
the server in service because the server is always up only that the
server the TCP resets keep increasing after few thousand connection. I
have tried close TCP connections at firewall going to dnscache but
still the same.

does the localdirector check for service availability? to me this looks
like the dnscache gets hit by requests and the localdirector throws it
out of service after probing service availability.
_this is just a wild guess_
what does the log of the localdirector say?

Post by Jeffrey Iskandar Ahmad
Does this problem happen because i group the master and dnscache
togather?

separating _content servers_ and _recursive resolvers_ is always a Good
Thing[tm]. give it a try and put dnscache into a separate group. also,
try to turn off "DNS service availability" checks; i suspect the
localdirector request 'version bind' or the SOA of 'localhost.', which
might introduced the breakage.

Post by Jeffrey Iskandar Ahmad
Here is what i get from the log file when grep tcp and dont know what
is this.
2002-04-19 17:33:14.482326500 query 3280 202.174.129.5:4421:1401 6
_ldap._tcp.default-first-site-name._sites.dc
._msdcs.wtp.com.my.
2002-04-19 17:33:15.978344500 query 3335 202.174.129.5:4422:1402 6
_tcp.default-first-site-name._sites.dc._msdc
s.wtp.com.my.
2002-04-19 17:33:17.403269500 query 3352 192.168.18.3:2854:1116 6
_kerberos._tcp.irora.com.

these definately are microsoft boxes, trying to resolve their magic
service names over dns. this has not really to do with RFCs, it's a
"vendor specific feature" (eg. a bug, because M$ couldn't come up for
years with a working cross-subnet browsing method for their "CIFS"/"UNC"
"standard" based networking implementation. netbios over ip is mereley
a try to run their proprietary protocol over TCP/IP).

Post by Jeffrey Iskandar Ahmad
Im thinking of blocking TCP but RFC 1123 says that resolvers ``MUST
support UDP, and should support TCP.''

yeah, but when it comes to microsoft, you won't expect them to stick to
RFCs, do you? RFCs also state that "_" SHOULD not be used in DNS names,
but you see what they do :-/

Post by Jeffrey Iskandar Ahmad
What do u guys suggest?

- put the recursive resolver (dnscache) into a separate group on the LD
- turn off any checks for service availability on the LD
- look into the logs of the localdirector
- do not grep for tcp in dnscache, but for non-answered queries
- look into dnscache's root/ip directory and look if you actually allow
your clients _and_ the local director to query the cache
- supply us the _complete_ non-modified output of
- grep -r ^ /service/dnscache (or whereever it lives)
- localdirector log
- output of "uname -a" of your host(s)
- version of djbdns that you use and if you installed it from source
or as a binary

these are just some ideas based on guesswork. feel free to flame me for
that. i didn't have my hands on a localobfuscator^Wlocaldirector for
years ;-)

regards,
/k

Post by Jeffrey Iskandar Ahmad
Experience is a teacher that gives the examination first and the
lesson afterwards.

Karsten W. Rohrbach

2002-04-20 17:15:15 UTC

Permalink

Post by Karsten W. Rohrbach
- grep -r ^ /service/dnscache (or whereever it lives)

oops, this should read
- grep ^ /service/dnscache/*

sorry, it's been late ;-)
/k

Post by Karsten W. Rohrbach
Things can only really be scientifically true if they could also be false
with different data. --Karl Popper

Jeffrey Iskandar Ahmad

2002-04-23 14:01:39 UTC

Permalink

I still having this problem. Most users successfully query dnscache but
tcp reset numbers keep increasing like few hundreds to 1 reset.

I read local director manual says the number is for

"TCP Reset Reassigns - Number of connections that are reassigned
because a real server(the dnscache) responded with an RST packet on a
new connection."

why dnscache respond rst packet on a new connection after few hundreds?
Bind does not produce any of this packet.

I have disabled TCP at firewall. No incoming and outgoing tcp but still
dnscache produce TCP rst packet. Does anyone know how to disable this.
I read someone say some client will query udp for tcp connection.
hmmm.. maybe there is relation here but i dont know how it works.

After and before i have configure dns probe on local director the
problem still exists.

Since there is no TCP open or close when i grep tcp in logfile
therefore i believe there no client try to do tcp connection on
dnscache.

Karsten W. Rohrbach

2002-04-24 10:18:18 UTC

Permalink

Post by Jeffrey Iskandar Ahmad
I still having this problem. Most users successfully query dnscache but
tcp reset numbers keep increasing like few hundreds to 1 reset.
I read local director manual says the number is for
"TCP Reset Reassigns - Number of connections that are reassigned
because a real server(the dnscache) responded with an RST packet on a
new connection."
why dnscache respond rst packet on a new connection after few hundreds?
Bind does not produce any of this packet.

because bind answers to queries from the world by default, and
dnscache's answers to clients are controlled by the configuration in
root/ip. please think about it and read the dnscache docs. also re-read
the localdirector docs and make sure that you _know_ the ip the
localdirector uses for any service probe queries.

Post by Jeffrey Iskandar Ahmad
I have disabled TCP at firewall. No incoming and outgoing tcp but still
dnscache produce TCP rst packet. Does anyone know how to disable this.
I read someone say some client will query udp for tcp connection.
hmmm.. maybe there is relation here but i dont know how it works.

i slowly start to doubt that the localdirector tells the truth. where is
the firewall installed you are talking about? something like this?

[dnscache box]---net---[firewall]---net---[localdirector]---inet

or this?

[dnscache box]---net---[localdirector]---net---[firewall]---inet

depending on your scenario you should use tcpdump on a box between the
dnscache and the LD to see what's _really_ going on. perhaps your
firewall is misbehaving, too.

perhaps you are ommitting information that leads us to the real
problem here?

Post by Jeffrey Iskandar Ahmad
After and before i have configure dns probe on local director the
problem still exists.

the idea was to _unconfigure_ the probe and see what happens. either you
misunderstood my words or i misunderstood yours....

Post by Jeffrey Iskandar Ahmad
Since there is no TCP open or close when i grep tcp in logfile

you said that you disallow tcp by configuration of some sort of firewall
somewhere in your setup. in "most" cases, subsystems do what they are told
to, so it is a perfectly reasonable thing if you don't see tcp packets
if you filter them on a firewall in front of the dnscache.

Post by Jeffrey Iskandar Ahmad
therefore i believe there no client try to do tcp connection on
dnscache.

belief is not proof.

Post by Jeffrey Iskandar Ahmad

Post by Karsten W. Rohrbach
- grep ^ /service/dnscache/*

this info is still missing. how are we supposed to help you without
guessing? think about what is needed to describe your complex problem
and what configuration/log files have to be seen to actually understand
you setup.

regards,
/k

Post by Jeffrey Iskandar Ahmad
Wenn in der Kueche alles stimmt, geht auch die Musik in Ordnung.

KR433/KR11-RIPE -- WebMonster Community Founder -- nGENn GmbH Senior Techie
http://www.webmonster.de/ -- ftp://ftp.webmonster.de/ -- http://www.ngenn.net/
GnuPG: 0xDEC948A6 D/E BF11 83E8 84A1 F996 68B4 A113 B393 6BF4 DEC9 48A6
REVOKED: 0x2964BF46 D/E 42F9 9FFF 50D4 2F38 DBEE DF22 3340 4F4E 2964 BF46
REVOKED: 0x4C44DA59 RSA F9 A0 DF 91 74 07 6A 1C 5F 0B E0 6B 4D CD 8C 44
My mail is GnuPG signed -- Unsigned ones are bogus -- http://www.gnupg.org/
Please do not remove my address from To: and Cc: fields in mailing lists. 10x

Rob Mayoff

2002-04-24 15:32:20 UTC

Permalink

Post by Karsten W. Rohrbach

Post by Jeffrey Iskandar Ahmad
why dnscache respond rst packet on a new connection after few hundreds?
Bind does not produce any of this packet.

because bind answers to queries from the world by default, and
dnscache's answers to clients are controlled by the configuration in
root/ip. please think about it and read the dnscache docs.

That doesn't explain why he's getting TCP RSTs (assuming that he really
is getting TCP RSTs). dnscache doesn't set SO_LINGER, so when dnscache
closes the socket, the peer should get a FIN packet, not a RST packet.

Perhaps Ahmad has an IP filtering rule on his system that blocks TCP
port 53 and responds to all packets with a RST packet. Otherwise, I
don't know why the LD would be seeing RSTs from dnscache. It would be
nice to see a tcpdump on at least the dnscache box showing some complete
transactions that result in a RST. It would also be nice to see one
from a client box.