Discussion:
dnscache memory requirements for large number of server files
Lloyd Standish
2010-09-07 06:38:56 UTC
Permalink
Please help me estimate memory requirements to run dnscache with about 769,000 files in the "servers" directory (/etc/service/dncache/root/servers).

Each file has 9 bytes. The filenames are the domain names to forward to an "override" nameserver (tinydns running on 127.0.0.2). Each file contains the same content: the IP 127.0.0.2. (Actually, the files are mostly hardlinks. Otherwise I would run out of inodes.)

This is part of a project to set up porn-blocking using a list of 769,000 porn domain names. dnscache should forward dns queries for the porn domains to tinydns, running on 127.0.0.2 on the same machine. tinydns should return a bogus IP (to a page saying access to the pornography has been blocked). Of course, I got this working on a few test domains before attempting to load the 769,000 servers entries.

I already loaded the 769,000 (minimal) zones into the tinydns data file, and ran "make." tinydns seems to be fine (with zero queries).

However, dnscache cannot load the 769.000 servers files with only 256 megs of physical memory. I have raised the CACHESIZE and DATALIMIT up to 20M and 100M, respectively.

How much memory should be necessary to do this (assuming it is possible)? This is running on a VPS and I could increase the available memory.
--
Lloyd
Jeff King
2010-09-07 15:20:20 UTC
Permalink
Post by Lloyd Standish
Please help me estimate memory requirements to run dnscache with about
769,000 files in the "servers" directory
(/etc/service/dncache/root/servers).
Memory requirements aside, I think this is probably a bad idea. Just
glancing at the code in roots.c, it looks like dnscache will do a linear
search through the 769,000 entries for every query.

As for the memory requirements, I would expect (and again, I just
glanced at the code) it to take only 769,000 * (average_domain_length +
64) bytes. Where the "64" comes from the fact that each entry gets a
fixed-size slot for server IPs. Which is probably only on the order of
50-60M or so.
Post by Lloyd Standish
However, dnscache cannot load the 769.000 servers files with only 256
megs of physical memory. I have raised the CACHESIZE and DATALIMIT up
to 20M and 100M, respectively.
I'm not sure, but there may be a leak in root.c:init2. Your best bet is
probably to try your experiment with 10,000, 20,000, etc, and see how
the memory scales.

-Peff
Daryl Tester
2010-09-07 15:17:07 UTC
Permalink
Post by Lloyd Standish
Please help me estimate memory requirements to run dnscache
with about 769,000 files in the "servers" directory
(/etc/service/dncache/root/servers).
Wow. Even if you could get away with loading this much data, I don't
think you'd want to as the code doesn't appear to be optimised for
such an extreme case. From a quick look at the relevant code (roots.c)
it would appear to be roughly the length of the domain name (in wire
format) + 64 bytes per domain name. And the resultant "array" (it's
actually a string) is linearly searched, which could be a killer.

If your C is up to it, I'd look at modifying dnscache to perform a CDB
lookup on the domain+querytype, and if you get a hit return your
fictitious answer, otherwise proceed with the query normally.
--
Regards,
Daryl Tester

"It's bad enough to have two heads, but it's worse when one's unoccupied."
-- Scatterbrain, "I'm with Stupid."
Lloyd Standish
2010-09-07 16:44:59 UTC
Permalink
Hi Daryl and Jeff,
Thanks for the information. I agree that a linear search would not be a good idea. Adding some sort of hashed lookup sounds like the way to go here. I'm not very experienced at C (my experience is primarily in perl and bash) and I have little time. If anyone here is interested in making this modification for pay, please contact me off-list.
--
Lloyd
Post by Daryl Tester
Post by Lloyd Standish
Please help me estimate memory requirements to run dnscache
with about 769,000 files in the "servers" directory
(/etc/service/dncache/root/servers).
Wow. Even if you could get away with loading this much data, I don't
think you'd want to as the code doesn't appear to be optimised for
such an extreme case. From a quick look at the relevant code (roots.c)
it would appear to be roughly the length of the domain name (in wire
format) + 64 bytes per domain name. And the resultant "array" (it's
actually a string) is linearly searched, which could be a killer.
If your C is up to it, I'd look at modifying dnscache to perform a CDB
lookup on the domain+querytype, and if you get a hit return your
fictitious answer, otherwise proceed with the query normally.
Scott Gifford
2010-09-07 18:18:53 UTC
Permalink
Post by Lloyd Standish
Hi Daryl and Jeff,
Thanks for the information. I agree that a linear search would not be a
good idea. Adding some sort of hashed lookup sounds like the way to go
here. I'm not very experienced at C (my experience is primarily in perl and
bash) and I have little time.
Going a bit off-topic for the list I know, but based on your comments Lloyd,
you may want to look at the Perl module Net::DNS::Nameserver:

http://search.cpan.org/search?query=Net::DNS::Nameserver&mode=all


I have had fantastic luck using it to solve small, strange DNS problems like
the one you are describing. You could put an instance of dnscache in front
of it to handle the caching and get answers without going to the Perl code,
but really for these sorts of things Perl is pretty fast.

Just a random idea, I know, but maybe it will be helpful.

------Scott.
Daryl Tester
2010-09-08 00:33:03 UTC
Permalink
(* Reply to /dev/null'd - damn autoresponders *)
Post by Scott Gifford
http://search.cpan.org/search?query=Net::DNS::Nameserver&mode=all
I have had fantastic luck using it to solve small, strange DNS problems like
the one you are describing. You could put an instance of dnscache in front
of it to handle the caching and get answers without going to the Perl code,
but really for these sorts of things Perl is pretty fast.
Just in case it's a "thinko" - he'd want the Perl lookup to occur before hitting
dnscache, otherwise it's a variant of his original problem (i.e. redirecting to
Perl instead of tinydns).
--
Regards,
Daryl Tester

"It's bad enough to have two heads, but it's worse when one's unoccupied."
-- Scatterbrain, "I'm with Stupid."
Ask Bjørn Hansen
2010-10-22 04:19:47 UTC
Permalink
Post by Scott Gifford
Post by Lloyd Standish
Hi Daryl and Jeff,
Thanks for the information. I agree that a linear search would not be a
good idea. Adding some sort of hashed lookup sounds like the way to go
here. I'm not very experienced at C (my experience is primarily in perl and
bash) and I have little time.
Going a bit off-topic for the list I know, but based on your comments
http://search.cpan.org/search?query=Net::DNS::Nameserver&mode=all
I have had fantastic luck using it to solve small, strange DNS problems
like the one you are describing. You could put an instance of dnscache in
front of it to handle the caching and get answers without going to the Perl
code, but really for these sorts of things Perl is pretty fast.
I know it's been a while, but for what it's worth the pool.ntp.org DNS
servers are running a server based on Net::DNS::Nameserver. It's not super
fast, but they - happily - do something like 30 million queries a day.
Making a proxy of sorts that just sends back a fixed response for something
on the blacklist and otherwise just forwards it to a real resolving server /
cache should be pretty easy.

Loading...