summaryrefslogtreecommitdiff
path: root/doc/requirements.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/requirements.txt')
-rw-r--r--doc/requirements.txt294
1 files changed, 294 insertions, 0 deletions
diff --git a/doc/requirements.txt b/doc/requirements.txt
new file mode 100644
index 000000000000..a66962d4a401
--- /dev/null
+++ b/doc/requirements.txt
@@ -0,0 +1,294 @@
+Requirements for Recursive Caching Resolver
+ (a.k.a. Treeshrew, Unbound-C)
+By W.C.A. Wijngaards, NLnet Labs, October 2006.
+
+Contents
+1. Introduction
+2. History
+3. Goals
+4. Non-Goals
+
+
+1. Introduction
+---------------
+This is the requirements document for a DNS name server and aims to
+document the goals and non-goals of the project. The DNS (the Domain
+Name System) is a global, replicated database that uses a hierarchical
+structure for queries.
+
+Data in the DNS is stored in Resource Record sets (RR sets), and has a
+time to live (TTL). During this time the data can be cached. It is
+thus useful to cache data to speed up future lookups. A server that
+looks up data in the DNS for clients and caches previous answers to
+speed up processing is called a caching, recursive nameserver.
+
+This project aims to develop such a nameserver in modular components, so
+that also DNSSEC (secure DNS) validation and stub-resolvers (that do not
+run as a server, but a linked into an application) are easily possible.
+
+The main components are the Validator that validates the security
+fingerprints on data sets, the Iterator that sends queries to the
+hierarchical DNS servers that own the data and the Cache that stores
+data from previous queries. The networking and query management code
+then interface with the modules to perform the necessary processing.
+
+In Section 2 the origins of the Unbound project are documented. Section
+3 lists the goals, while Section 4 lists the explicit non-goals of the
+project. Section 5 discusses choices made during development.
+
+
+2. History
+----------
+The unbound resolver project started by Bill Manning, David Blacka, and
+Matt Larson (from the University of California and from Verisign), that
+created a Java based prototype resolver called Unbound. The basic
+design decisions of clean modules was executed.
+
+The Java prototype worked very well, with contributions from Geoff
+Sisson and Roy Arends from Nominet. Around 2006 the idea came to create
+a full-fledged C implementation ready for deployed use. NLnet Labs
+volunteered to write this implementation.
+
+
+3. Goals
+--------
+o A validating recursive DNS resolver.
+o Code diversity in the DNS resolver monoculture.
+o Drop-in replacement for BIND apart from config.
+o DNSSEC support.
+o Fully RFC compliant.
+o High performance
+ * even with validation.
+o Used as
+ * stub resolver.
+ * full caching name server.
+ * resolver library.
+o Elegant design of validator, resolver, cache modules.
+ * provide the ability to pick and choose modules.
+o Robust.
+o In C, open source: The BSD license.
+o Highly portable, targets include modern Unix systems, such as *BSD,
+solaris, linux, and maybe also the windows platform.
+o Smallest as possible component that does the job.
+o Stub-zones can be configured (local data or AS112 zones).
+
+
+4. Non-Goals
+------------
+o An authoritative name server.
+o Too many Features.
+
+
+5. Choices
+----------
+o rfc2181 decourages duplicates RRs in RRsets. unbound does not create
+ duplicates, but when presented with duplicates on the wire from the
+ authoritative servers, does not perform duplicate removal.
+ It does do some rrsig duplicate removal, in the msgparser, for dnssec qtype
+ rrsig and any, because of special rrsig processing in the msgparser.
+o The harden-glue feature, when yes all out of zone glue is deleted, when
+ no out of zone glue is used for further resolving, is more complicated
+ than that, see below.
+ Main points:
+ * rfc2182 trust handling is used.
+ * data is let through only in very specific cases
+ * spoofability remains possible.
+ Not all glue is let through (despite the name of the option). Only glue
+ which is present in a delegation, of type A and AAAA, where the name is
+ present in the NS record in the authority section is let through.
+ The glue that is let through is stored in the cache (marked as 'from the
+ additional section'). And will then be used for sending queries to. It
+ will not be present in the reply to the client (if RD is off).
+ A direct query for that name will attempt to get a msg into the message
+ cache. Since A and AAAA queries are not synthesized by the unbound cache,
+ this query will be (eventually) sent to the authoritative server and its
+ answer will be put in the cache, marked as 'from the answer section' and
+ thus remove the 'from the additional section' data, and this record is
+ returned to the client.
+ The message has a TTL smaller or equal to the TTL of the answer RR.
+ If the cache memory is low; the answer RR may be dropped, and a glue
+ RR may be inserted, within the message TTL time, and thus return the
+ spoofed glue to a client. When the message expires, it is refetched and
+ the cached RR is updated with the correct content.
+ The server can be spoofed by getting it to visit a especially prepared
+ domain. This domain then inserts an address for another authoritative
+ server into the cache, when visiting that other domain, this address may
+ then be used to send queries to. And fake answers may be returned.
+ If the other domain is signed by DNSSEC, the fakes will be detected.
+
+ In summary, the harden glue feature presents a security risk if
+ disabled. Disabling the feature leads to possible better performance
+ as more glue is present for the recursive service to use. The feature
+ is implemented so as to minimise the security risk, while trying to
+ keep this performance gain.
+o The method by which dnssec-lameness is detected is not secure. DNSSEC lame
+ is when a server has the zone in question, but lacks dnssec data, such as
+ signatures. The method to detect dnssec lameness looks at nonvalidated
+ data from the parent of a zone. This can be used, by spoofing the parent,
+ to create a false sense of dnssec-lameness in the child, or a false sense
+ or dnssec-non-lameness in the child. The first results in the server marked
+ lame, and not used for 900 seconds, and the second will result in a
+ validator failure (SERVFAIL again), when the query is validated later on.
+
+ Concluding, a spoof of the parent delegation can be used for many cases
+ of denial of service. I.e. a completely different NS set could be returned,
+ or the information withheld. All of these alterations can be caught by
+ the validator if the parent is signed, and result in 900 seconds bogus.
+ The dnssec-lameness detection is used to detect operator failures,
+ before the validator will properly verify the messages.
+
+ Also for zones for which no chain of trust exists, but a DS is given by the
+ parent, dnssec-lameness detection enables. This delivers dnssec to our
+ clients when possible (for client validators).
+
+ The following issue needs to be resolved:
+ a server that serves both a parent and child zone, where
+ parent is signed, but child is not. The server must not be marked
+ lame for the parent zone, because the child answer is not signed.
+ Instead of a false positive, we want false negatives; failure to
+ detect dnssec-lameness is less of a problem than marking honest
+ servers lame. dnssec-lameness is a config error and deserves the trouble.
+ So, only messages that identify the zone are used to mark the zone
+ lame. The zone is identified by SOA or NS RRsets in the answer/auth.
+ That includes almost all negative responses and also A, AAAA qtypes.
+ That would be most responses from servers.
+ For referrals, delegations that add a single label can be checked to be
+ from their zone, this covers most delegation-centric zones.
+
+ So possibly, for complicated setups, with multiple (parent-child) zones
+ on a server, dnssec-lameness detection does not work - no dnssec-lameness
+ is detected. Instead the zone that is dnssec-lame becomes bogus.
+
+o authority features.
+ This is a recursive server, and authority features are out of scope.
+ However, some authority features are expected in a recursor. Things like
+ localhost, reverse lookup for 127.0.0.1, or blocking AS112 traffic.
+ Also redirection of domain names with fixed data is needed by service
+ providers. Limited support is added specifically to address this.
+
+ Adding full authority support, requires much more code, and more complex
+ maintenance.
+
+ The limited support allows adding some static data (for localhost and so),
+ and to respond with a fixed rcode (NXDOMAIN) for domains (such as AS112).
+
+ You can put authority data on a separate server, and set the server in
+ unbound.conf as stub for those zones, this allows clients to access data
+ from the server without making unbound authoritative for the zones.
+
+o the access control denies queries before any other processing.
+ This denies queries that are not authoritative, or version.bind, or any.
+ And thus prevents cache-snooping (denied hosts cannot make non-recursive
+ queries and get answers from the cache).
+
+o If a client makes a query without RD bit, in the case of a returned
+ message from cache which is:
+ answer section: empty
+ auth section: NS record present, no SOA record, no DS record,
+ maybe NSEC or NSEC3 records present.
+ additional: A records or other relevant records.
+ A SOA record would indicate that this was a NODATA answer.
+ A DS records would indicate a referral.
+ Absence of NS record would indicate a NODATA answer as well.
+
+ Then the receiver does not know whether this was a referral
+ with attempt at no-DS proof) or a nodata answer with attempt
+ at no-data proof. It could be determined by attempting to prove
+ either condition; and looking if only one is valid, but both
+ proofs could be valid, or neither could be valid, which creates
+ doubt. This case is validated by unbound as a 'referral' which
+ ascertains that RRSIGs are OK (and not omitted), but does not
+ check NSEC/NSEC3.
+
+o Case preservation
+ Unbound preserves the casing received from authority servers as best
+ as possible. It compresses without case, so case can get lost there.
+ The casing from the query name is used in preference to the casing
+ of the authority server. This is the same as BIND. RFC4343 allows either
+ behaviour.
+
+o Denial of service protection
+ If many queries are made, and they are made to names for which the
+ authority servers do not respond, then the requestlist for unbound
+ fills up fast. This results in denial of service for new queries.
+ To combat this the first 50% of the requestlist can run to completion.
+ The last 50% of the requestlist get (200 msec) at least and are replaced
+ by newer queries when older (LIFO).
+ When a new query comes in, and a place in the first 50% is available, this
+ is preferred. Otherwise, it can replace older queries out of the last 50%.
+ Thus, even long queries get a 50% chance to be resolved. And many 'short'
+ one or two round-trip resolves can be done in the last 50% of the list.
+ The timeout can be configured.
+
+o EDNS fallback. Is done according to the EDNS RFC (and update draft-00).
+ Unbound assumes EDNS 0 support for the first query. Then it can detect
+ support (if the servers replies) or non-support (on a NOTIMPL or FORMERR).
+ Some middleboxes drop EDNS 0 queries, mainly when forwarding, not when
+ routing packets. To detect this, when timeouts keep happening, as the
+ timeout approached 5-10 seconds, and EDNS status has not been detected yet,
+ a single probe query is sent. This probe has a sub-second timeout, and
+ if the server responds (quickly) without EDNS, this is cached for 15 min.
+ This works very well when detecting an address that you use much - like
+ a forwarder address - which is where the middleboxes need to be detected.
+ Otherwise, it results in a 5 second wait time before EDNS timeout is
+ detected, which is slow but it works at least.
+ It minimizes the chances of a dropped query making a (DNSSEC) EDNS server
+ falsely EDNS-nonsupporting, and thus DNSSEC-bogus, works well with
+ middleboxes, and can detect the occasional authority that drops EDNS.
+ For some boxes it is necessary to probe for every failing query, a
+ reassurance that the DNS server does EDNS does not mean that path can
+ take large DNS answers.
+
+o 0x20 backoff.
+ The draft describes to back off to the next server, and go through all
+ servers several times. Unbound goes on get the full list of nameserver
+ addresses, and then makes 3 * number of addresses queries.
+ They are sent to a random server, but no one address more than 4 times.
+ It succeeds if one has 0x20 intact, or else all are equal.
+ Otherwise, servfail is returned to the client.
+
+o NXDOMAIN and SOA serial numbers.
+ Unbound keeps TTL values for message formats, and thus rcodes, such
+ as NXDOMAIN. Also it keeps the latest rrsets in the rrset cache.
+ So it will faithfully negative cache for the exact TTL as originally
+ specified for an NXDOMAIN message, but send a newer SOA record if
+ this has been found in the mean time. In point, this could lead to a
+ negative cached NXDOMAIN reply with a SOA RR where the serial number
+ indicates a zone version where this domain is not any longer NXDOMAIN.
+ These situations become consistent once the original TTL expires.
+ If the domain is DNSSEC signed, by the way, then NSEC records are
+ updated more carefully. If one of the NSEC records in an NXDOMAIN is
+ updated from another query, the NXDOMAIN is dropped from the cache,
+ and queried for again, so that its proof can be checked again.
+
+o SOA records in negative cached answers for DS queries.
+ The current unbound code uses a negative cache for queries for type DS.
+ This speeds up building chains of trust, and uses NSEC and NSEC3
+ (optout) information to speed up lookups. When used internally,
+ the bare NSEC(3) information is sufficient, probably picked up from
+ a referral. When answering to clients, a SOA record is needed for
+ the correct message format, a SOA record is picked from the cache
+ (and may not actually match the serial number of the SOA for which the
+ NSEC and NSEC3 records were obtained) if available otherwise network
+ queries are performed to get the data.
+
+o Parent and child with different nameserver information.
+ A misconfiguration that sometimes happens is where the parent and child
+ have different NS, glue information. The child is authoritative, and
+ unbound will not trust information from the parent nameservers as the
+ final answer. To help lookups, unbound will however use the parent-side
+ version of the glue as a last resort lookup. This resolves lookups for
+ those misconfigured domains where the servers reported by the parent
+ are the only ones working, and servers reported by the child do not.
+
+o Failure of validation and probing.
+ Retries on a validation failure are now 5x to a different nameserver IP
+ (if possible), and then it gives up, for one name, type, class entry in
+ the message cache. If a DNSKEY or DS fails in the chain of trust in the
+ key cache additionally, after the probing, a bad key entry is created that
+ makes the entire zone bogus for 900 seconds. This is a fixed value at
+ this time and is conservative in sending probes. It makes the compound
+ effect of many resolvers less and easier to handle, but penalizes
+ individual resolvers by having less probes and a longer time before fixes
+ are picked up.
+