when you bring it on yourself
October 19th, 2009Every XML document has (or is supposed to have) a DTD reference section, right at the top of the document.
<html xmlns=”http://www.w3.org/1999/xhtml” …>
In the past, I’ve ranted about the necessity for these, given that I have needed to fight software libraries which fail mysteriously when no internet connectivity is present (yes, they check for the existence of the DTD. Doh).
Now the W3C system team blog complains about W3C’s excessive DTD traffic. In short, they basically gave themselves a denial of service.
These refer to HTML DTDs and namespace documents hosted on W3C’s site.
Note that these are not hyperlinks; these URIs are used for identification. This is a machine-readable way to say “this is HTML”. In particular, software does not usually need to fetch these resources, and certainly does not need to fetch the same one over and over! Yet we receive a surprisingly large number of requests for such resources: up to 130 million requests per day, with periods of sustained bandwidth usage of 350Mbps, for resources that haven’t changed in years.
The vast majority of these requests are from systems that are processing various types of markup (HTML, XML, XSLT, SVG) and in the process doing something like validating against a DTD or schema
Umm, ok. So, why do we have it as a proper addressible URL if it is never intended to be fetched?