Bastien Devos
2004-02-25 15:34:41 UTC
Hi,
I'm using djbdns on two servers, a master (Solaris 8) and backup (Fedora Core 1
Linux).
On each server, I have a instance of dnscache running on the public ethernet
interface, and querying the tinydns server (running on 127.0.0.1) for the
authoritative entries, forwarding all the rest to an external server.
DNS queries are working fine on client machines (Solaris, Linux, Windows), but I
have a question for which I didn't find a satisfying answer either on google or
the m/l archive.
I launched my dns servers many weeks ago, and yesterday I ran an svstat command
to see if my processes are OK and what's the uptime.
So, on the master server, the Solaris one, I did the following command :
master# svstat /service/dnscache
/service/dnscache: up (pid 9400) 523 seconds
This is of course not normal, so I did the following to figure out what's wrong,
and get that first result :
master# while true; do svstat /service/dnscache; sleep 2; done
(...)
/service/dnscache: up (pid 9400) 630 seconds
/service/dnscache: up (pid 9400) 632 seconds
/service/dnscache: up (pid 9400) 634 seconds
/service/dnscache: up (pid 9400) 636 seconds
/service/dnscache: supervise not running
/service/dnscache: supervise not running
/service/dnscache: up (pid 9634) 2 seconds
/service/dnscache: up (pid 9634) 4 seconds
/service/dnscache: up (pid 9634) 6 seconds
/service/dnscache: up (pid 9634) 8 seconds
(...)
after some other 100's of seconds, I get another error :
(...)
/service/dnscache: up (pid 9634) 622 seconds
/service/dnscache: up (pid 9634) 624 seconds
/service/dnscache: up (pid 9634) 626 seconds
/service/dnscache: up (pid 9634) 629 seconds, want down
/service/dnscache: up (pid 10363) 1 seconds
/service/dnscache: up (pid 10363) 3 seconds
/service/dnscache: up (pid 10363) 5 seconds
(...)
once again, I get something else :
(...)
/service/dnscache: up (pid 11197) 592 seconds
/service/dnscache: up (pid 11197) 594 seconds
/service/dnscache: supervise not running
/service/dnscache: supervise not running
/service/dnscache: down 0 seconds, normally up, want up
/service/dnscache: up (pid 11778) 2 seconds
/service/dnscache: up (pid 11778) 4 seconds
(...)
When I execute the same command on the backup server (the linux box), I get this
output :
backup# svstat /service/dnscache/
/service/dnscache/: up (pid 19225) 3626400 seconds
backup# svstat /service/tinydns/
/service/tinydns/: up (pid 19228) 3626342 seconds
... no comment.
This problem is quite transparent for the client, but it's not very clean.
The error on the solaris box seems to appear every +/- 600, 700 seconds.
I guess this is related to Solaris and daemontools, but I don't have any idea of
what this could be ...
Any help would be very appreciated,
Thanks
Bastien.
I'm using djbdns on two servers, a master (Solaris 8) and backup (Fedora Core 1
Linux).
On each server, I have a instance of dnscache running on the public ethernet
interface, and querying the tinydns server (running on 127.0.0.1) for the
authoritative entries, forwarding all the rest to an external server.
DNS queries are working fine on client machines (Solaris, Linux, Windows), but I
have a question for which I didn't find a satisfying answer either on google or
the m/l archive.
I launched my dns servers many weeks ago, and yesterday I ran an svstat command
to see if my processes are OK and what's the uptime.
So, on the master server, the Solaris one, I did the following command :
master# svstat /service/dnscache
/service/dnscache: up (pid 9400) 523 seconds
This is of course not normal, so I did the following to figure out what's wrong,
and get that first result :
master# while true; do svstat /service/dnscache; sleep 2; done
(...)
/service/dnscache: up (pid 9400) 630 seconds
/service/dnscache: up (pid 9400) 632 seconds
/service/dnscache: up (pid 9400) 634 seconds
/service/dnscache: up (pid 9400) 636 seconds
/service/dnscache: supervise not running
/service/dnscache: supervise not running
/service/dnscache: up (pid 9634) 2 seconds
/service/dnscache: up (pid 9634) 4 seconds
/service/dnscache: up (pid 9634) 6 seconds
/service/dnscache: up (pid 9634) 8 seconds
(...)
after some other 100's of seconds, I get another error :
(...)
/service/dnscache: up (pid 9634) 622 seconds
/service/dnscache: up (pid 9634) 624 seconds
/service/dnscache: up (pid 9634) 626 seconds
/service/dnscache: up (pid 9634) 629 seconds, want down
/service/dnscache: up (pid 10363) 1 seconds
/service/dnscache: up (pid 10363) 3 seconds
/service/dnscache: up (pid 10363) 5 seconds
(...)
once again, I get something else :
(...)
/service/dnscache: up (pid 11197) 592 seconds
/service/dnscache: up (pid 11197) 594 seconds
/service/dnscache: supervise not running
/service/dnscache: supervise not running
/service/dnscache: down 0 seconds, normally up, want up
/service/dnscache: up (pid 11778) 2 seconds
/service/dnscache: up (pid 11778) 4 seconds
(...)
When I execute the same command on the backup server (the linux box), I get this
output :
backup# svstat /service/dnscache/
/service/dnscache/: up (pid 19225) 3626400 seconds
backup# svstat /service/tinydns/
/service/tinydns/: up (pid 19228) 3626342 seconds
... no comment.
This problem is quite transparent for the client, but it's not very clean.
The error on the solaris box seems to appear every +/- 600, 700 seconds.
I guess this is related to Solaris and daemontools, but I don't have any idea of
what this could be ...
Any help would be very appreciated,
Thanks
Bastien.