The Lair

Do not meddle in the affairs of dragons, for you are crunchy and taste good with ketchup

when you bring it on yourself

Every XML document has (or is supposed to have) a DTD reference section, right at the top of the document.

<html xmlns=”http://www.w3.org/1999/xhtml” …>

In the past, I’ve ranted about the necessity for these, given that I have needed to fight software libraries which fail mysteriously when no internet connectivity is present (yes, they check for the existence of the DTD. Doh).

Now the W3C system team blog complains about W3C’s excessive DTD traffic. In short, they basically gave themselves a denial of service.

These refer to HTML DTDs and namespace documents hosted on W3C’s site.

Note that these are not hyperlinks; these URIs are used for identification. This is a machine-readable way to say “this is HTML”. In particular, software does not usually need to fetch these resources, and certainly does not need to fetch the same one over and over! Yet we receive a surprisingly large number of requests for such resources: up to 130 million requests per day, with periods of sustained bandwidth usage of 350Mbps, for resources that haven’t changed in years.

The vast majority of these requests are from systems that are processing various types of markup (HTML, XML, XSLT, SVG) and in the process doing something like validating against a DTD or schema

Umm, ok. So, why do we have it as a proper addressible URL if it is never intended to be fetched?

“when you bring it on yourself” has 2 comments

  1. Gravatar

    H wrote:

    Ahahaha. Serves them right. Over-precision has a way of returning to bite you in the posterior.

  2. Gravatar

    Chintana wrote:

    Having it as an addressable URL is a bit unfortunate. People came up with RDDL for solving this problem but that didn’t caught up with the masses. Not sure why a library tries to fetch it in the first place, when it’s clearly stated that you shouldn’t.

Just say it

*Required
*Required (This site supports gravatars)