Discussion:
Failure of secondaries
Hal Burgiss
2010-09-11 19:44:16 UTC
Permalink
Hello,

I am trying understand a predicament I found myself in today. As background,
my environment is that I work for a small web hosting company. We handle the
authoritative DNS for most of our clients, using djbdns/tinydns. So we have
ns1, ns2, and ns3 type setup. data.cdb is shared among the 3 when the
Makefile is executed so that everything stays in sync. This is a
non-clustered set up, with one ip address per server.

This has seemed to work flawlessly for years now. Last night though someone
inadvertantly disconnected the wrong server, and unplugged the ns1 system.
The eventual impact of that one mistake was that the dns for the hosted
domains all went down totally. The ns2 and n3 systems were never queried.
Direct querying during testing showed they were responding normally (eg dig
blah.com @ns2). Yet, for all practical purposes they might as well been
unplugged too since they were totally quiet. I had been under the false
assumption that should ns1 go down, that the others would automatically come
into play. What am I missing?

Secondly, when I realized what happened and that the two secondary systems
were totally useless, I moved the ip address from the ns1 to ns3, and
changed the tinydns configs, restarted the service, verified that tinydns
was listening on the correct ip and port, and direct test queries worked
fine. I am doing all this remotely, and did not have the ability to
reconnect the original system. I was assuming the ip move would be a
reasonable hotfix. But this did not work. Some 2 hours later the original
system was reconnected, and within mintues all started working normally
again. Help me understand this so I can avoid this kind of headache in the
future!

Thank you.
--
Hal
Andy Bradford
2010-09-11 20:07:56 UTC
Permalink
Post by Hal Burgiss
What am I missing?
For starters, you could provide one of the domains that was impacted so
that list members can diagnose it for you. Is burgiss.net one such
domain?

Second, how about a sample of pertinent records for one of the domains
that dropped off the Internet when your ns1 went down? Please, do not
massage the data to hide anything.

Andy
Maciej Żenczykowski
2010-09-11 20:10:34 UTC
Permalink
Stupid question perhaps, but where the domains in question actually
delegated to ns2/ns3? or where they only delegated to ns1...

Moving the IP address from ns1 to ns3 would only work if they were in
the same level 2 subnet, and there was no fancy setup involved (for
example no ARP queries being sent by the router).
Post by Hal Burgiss
Hello,
I am trying understand a predicament I found myself in today. As background,
my environment is that I work for a small web hosting company. We handle the
authoritative DNS for most of our clients, using djbdns/tinydns. So we have
ns1, ns2, and ns3 type setup. data.cdb is shared among the 3 when the
Makefile is executed so that everything stays in sync. This is a
non-clustered set up, with one ip address per server.
This has seemed to work flawlessly for years now. Last night though someone
inadvertantly disconnected the wrong server, and unplugged the ns1 system.
The eventual impact of that one mistake was that the dns for the hosted
domains all went down totally. The ns2 and n3 systems were never queried.
Direct querying during testing showed they were responding normally (eg dig
unplugged too since they were totally quiet. I had been under the false
assumption that should ns1 go down, that the others would automatically come
into play. What am I missing?
Secondly, when I realized what happened and that the two secondary systems
were totally useless, I moved the ip address from the ns1 to ns3, and
changed the tinydns configs, restarted the service, verified that tinydns
was listening on the correct ip and port, and direct test queries worked
fine. I am doing all this remotely, and did not have the ability to
reconnect the original system. I was assuming the ip move would be a
reasonable hotfix. But this did not work. Some 2 hours later the original
system was reconnected, and within mintues all started working normally
again. Help me understand this so I can avoid this kind of headache in the
future!
Thank you.
--
Hal
Hal Burgiss
2010-09-11 21:00:26 UTC
Permalink
Again, sorry, should have gone to the list.

2010/9/11 Maciej ¯enczykowski <***@gmail.com>

Stupid question perhaps, but where the domains in question actually
Post by Maciej Żenczykowski
delegated to ns2/ns3? or where they only delegated to ns1...
Good question. I would assume so, but not sure how/where that gets set, so
possibly not. In the normal course of things, the other 2 seem to get a
reasonable amount of query traffic.
Post by Maciej Żenczykowski
Moving the IP address from ns1 to ns3 would only work if they were in
the same level 2 subnet, and there was no fancy setup involved (for
example no ARP queries being sent by the router).
Same subnet, adjacent ip's in fact. Based on previous experience moving ip's
around, there is no significant arp caching or delays. Maybe a few seconds
to a minute. Would this delay be due to caching somewhere else, say root
nameservers? Do they cache at all?

Thanks!
--
Hal
--
Hal
Hal Burgiss
2010-09-11 20:58:52 UTC
Permalink
Sorry this should have gone to the list ...

On Sat, Sep 11, 2010 at 4:07 PM, Andy Bradford <
Post by Andy Bradford
Post by Hal Burgiss
What am I missing?
For starters, you could provide one of the domains that was impacted so
that list members can diagnose it for you. Is burgiss.net one such
domain?
No, sorry: try greenlightsport.com
Post by Andy Bradford
Second, how about a sample of pertinent records for one of the domains
that dropped off the Internet when your ns1 went down? Please, do not
massage the data to hide anything.
Zgreenlightsport.com:ns1.resultsbydesign.com:technical.dbswebsite.com:14400

.greenlightsport.com::ns1.resultsbydesign.com:14400
.greenlightsport.com::ns2.resultsbydesign.com:14400
.greenlightsport.com::ns3.resultsbydesign.com:14400
+greenlightsport.com:216.253.111.179:14400
+www.greenlightsport.com:216.253.111.179:14400
@greenlightsport.com::mailstore1.secureserver.net:10:14400
@greenlightsport.com::smtp.secureserver.net:100:14400


Thanks.
--
Hal
--
Hal
Maciej Żenczykowski
2010-09-12 00:05:36 UTC
Permalink
First off, ns1 and ns2 seem to have more than 1 A record, for no good
reason. ns3 is ok.

[***@nike ~]$ host ns1.resultsbydesign.com
ns1.resultsbydesign.com has address 216.253.111.178
ns1.resultsbydesign.com has address 216.253.111.178
ns1.resultsbydesign.com has address 216.253.111.178
ns1.resultsbydesign.com has address 216.253.111.178
ns1.resultsbydesign.com has address 216.253.111.178
ns1.resultsbydesign.com has address 216.253.111.178
ns1.resultsbydesign.com has address 216.253.111.178
ns1.resultsbydesign.com has address 216.253.111.178

[***@nike ~]$ host ns2.resultsbydesign.com
ns2.resultsbydesign.com has address 216.253.111.180
ns2.resultsbydesign.com has address 216.253.111.180
ns2.resultsbydesign.com has address 216.253.111.180
ns2.resultsbydesign.com has address 216.253.111.180
ns2.resultsbydesign.com has address 216.253.111.180
ns2.resultsbydesign.com has address 216.253.111.180
ns2.resultsbydesign.com has address 216.253.111.180
ns2.resultsbydesign.com has address 216.253.111.180

[***@nike ~]$ host ns3.resultsbydesign.com
ns3.resultsbydesign.com has address 216.253.111.163

Here's a query to ns1:

[***@nike ~]$ host -v -t ns greenlightsport.com 216.253.111.178
Trying "greenlightsport.com"
Using domain server:
Name: 216.253.111.178
Address: 216.253.111.178#53
Aliases:

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6894
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;greenlightsport.com. IN NS

;; ANSWER SECTION:
greenlightsport.com. 14400 IN NS ns1.resultsbydesign.com.
greenlightsport.com. 14400 IN NS ns2.resultsbydesign.com.
greenlightsport.com. 14400 IN NS ns3.resultsbydesign.com.

Received 107 bytes from 216.253.111.178#53 in 136 ms

Here's a query to ns2:

[***@nike ~]$ host -v -t ns greenlightsport.com 216.253.111.180
Trying "greenlightsport.com"
;; connection timed out; no servers could be reached

and ns3:

[***@nike ~]$ host -v -t ns greenlightsport.com 216.253.111.163
Trying "greenlightsport.com"
;; connection timed out; no servers could be reached

Either those servers are down/broken/misconfigured, or you've got a
firewall somewhere in there...
Andy Bradford
2010-09-12 00:13:39 UTC
Permalink
Post by Hal Burgiss
Zgreenlightsport.com:ns1.resultsbydesign.com:technical.dbswebsite.com:14400
.greenlightsport.com::ns1.resultsbydesign.com:14400
.greenlightsport.com::ns2.resultsbydesign.com:14400
.greenlightsport.com::ns3.resultsbydesign.com:14400
+greenlightsport.com:216.253.111.179:14400
+www.greenlightsport.com:216.253.111.179:14400
@greenlightsport.com::mailstore1.secureserver.net:10:14400
@greenlightsport.com::smtp.secureserver.net:100:14400
You have left out the IP addresses for your NS records. If you don't
provide . with an IP for the A record it has no glue. You either need to
put an IP in there, or use a separate A record with + for them:

+ns1.resultsbydesign.com:216.253.111.178:14400
+ns2.resultsbydesign.com:216.253.111.180:14400
+ns3.resultsbydesign.com:216.253.111.163:14400

It's likely that tinydns doesn't know that it needs to answer the
queries (you can show this by looking at your tinydns logs on ns2 and
ns3). Neither of them respond to queries:

$ dnsq a www.greenlightsport.com ns2.resultsbydesign.com
1 www.greenlightsport.com:
timed out

Andy

Loading...