The Lair

Do not meddle in the affairs of dragons, for you are crunchy and taste good with ketchup

Archive for the 'web' Category

when you bring it on yourself

October 19th, 2009

Every XML document has (or is supposed to have) a DTD reference section, right at the top of the document.

<html xmlns=”http://www.w3.org/1999/xhtml” …>

In the past, I’ve ranted about the necessity for these, given that I have needed to fight software libraries which fail mysteriously when no internet connectivity is present (yes, they check for the existence of the DTD. Doh).

Now the W3C system team blog complains about W3C’s excessive DTD traffic. In short, they basically gave themselves a denial of service.

These refer to HTML DTDs and namespace documents hosted on W3C’s site.

Note that these are not hyperlinks; these URIs are used for identification. This is a machine-readable way to say “this is HTML”. In particular, software does not usually need to fetch these resources, and certainly does not need to fetch the same one over and over! Yet we receive a surprisingly large number of requests for such resources: up to 130 million requests per day, with periods of sustained bandwidth usage of 350Mbps, for resources that haven’t changed in years.

The vast majority of these requests are from systems that are processing various types of markup (HTML, XML, XSLT, SVG) and in the process doing something like validating against a DTD or schema

Umm, ok. So, why do we have it as a proper addressible URL if it is never intended to be fetched?

wikipedia: corrupted by the system?

June 29th, 2009

When it started out, Wikipedia was a poster child for everything that mainstream media outlets wasn’t – decentralized control and editing abilities (“anyone can edit anything!”) and filled with obscure yet fascinating pieces of information. Is it the same today? Yes, yes it is but the limits under which Wikipedia (like most other organizations) must operate have become clearer. Because yes, there are limits. A free-for-all editing structure requires more rules, not fewer.

The first signs: banning Church of Scientology IP addresses from editing Wikipedia. So, not such a big deal after all – Wikipedia routinely bars most open proxy IPs from editing pages, after repeated and sustained editing abuse. This is all part and parcel of Wikipedia practice, made necessary because of the anonymous internet. Perhaps it was the first time that Wikipedia had performed an organizational IP block, though (although I doubt it).

But this most recent case? Much worse, depending on your perspective. The New York Times is reporting that Wikipedia voluntarily suppressed news about a kidnapping in order to prevent the ransom value from going up.

Times executives believed that publicity would raise Mr. Rohde’s value to his captors as a bargaining chip and reduce his chance of survival. Persuading another publication or a broadcaster not to report the kidnapping usually meant just a phone call from one editor to another, said Bill Keller, executive editor of The Times

Mmmm. Collusion to prevent the publication of items with legitimate news interest to the public? check. Self censorship? check.

And then it gets better.

The Wikipedia page history shows that the next day, Nov. 13, someone without a user name edited the entry on Mr. Rohde for the first time to include the kidnapping. Mr. Moss deleted the addition, and the same unidentified user promptly restored it, adding a note protesting the removal

Around that time, Catherine J. Mathis, the chief spokeswoman for the New York Times Company, called Mr. Wales and asked for his help.

And then there is a (not very) interesting story of cat-and-mouse, of editorial freezings of pages and so on.

What does this all mean? The person asked by the article author seems to think that –

[Wikipedia] role in suppressing news about Mr. Rohde would [probably not] prompt an outcry among longtime editors, because in the Rohde case, lives were at stake

That’s probably true. I don’t think there was any other choice for Jimmy Wales and his administrators, even if they had contemplated alternatives. But on the other hand, the media is responsible for the destruction of lives too. What made the life of a random reporter (who must have known about the risks going in) more important than media witch hunts which bring down so many?

Part of the appeal of a true crowd sourcing was that interests were diluted. No single person is accountable, no single point of view pushed (but then again, we all know that with Wikipedia editorial politics, that was never true at any point anyway). What if the next people who’s lives are at stake don’t have Jimmy Wales on their rolodeck to call and ask for a special favour? Democracy didn’t die with the publication of this story. The illusion that any crowd sourced site is more powerful than the administrative cabal probably did though. And not a moment too soon.

eye pee six

February 8th, 2008

So, ICANN switched the root servers over to IPv6 a few days ago. This is a big deal, because although IPv6 has been “officially” deployed since 1999, adoption has been extremely slow. The most commonly used existing addressing scheme (IPv4) is estimated to run out in 2-3 years. Then again, analysts have been screaming about peak oil for a while too, so perhaps it’s all hyperbole.

Part of the reason for the slow adoption rate is because everyone (myself included) has the “if it ain’t broke yet, why bother fixing it” mentality. That’s pretty much why GoPHP5 was started, that’s also why a large number of websites still use Apache 1.3 over Apache 2.0 or 2.2. Anyway, I decided that I would figure this mysterious IPv6 thing out and did a bit of digging.

The problem is, even without knowing that I do – I tend to be quite reliant on the existing addressing scheme. How so? Let me count the ways.

One application area that has seen a fair few advances (for IPv4 addresses) is geolocation technology. For example, it’s currently easy to identify the geographic location of an IP address (see zonefiles – I use this as a quick and easy geolocation service). More comprehensive services are also available ie:, ip2location, hostip.info and the daddy, Maxmind’s GeoIP. With the vastly larger address space of IPv6, the geolocation mapping will need to be reconstructed; and even then the sheer size of addresses may allow a descent into anarchy. The IANA (which governs over IP address allocation) will probably be far less inclined to rigourously police the assignment of IPv6 addresses.

Another spiffy application area that will take a hit (at least in the short term) is the DNSBL system (DNS blacklists) – which publish a set of unsavoury IP addresses. I rely on quite a few of those services (project honeypot, botsvsbrowsers and sorbs) to help head off spammers at the pass. With more addresses than you can shake a stick at, denying by IPs is going to get a little bit harder too.

So, I want IPv6 to turn up soon, yet I don’t. The problem is, there isn’t an easy way yet to test out how IPv6 will work. Most of the internet doesn’t actually seem to know how to route things to and from the spiffy new IPv6 adddresses. That’s where places like Sixxs and Hexago come in. They allow IPv6 addresses to be tunnelled via the existing IPv4 infrastructure.

So, I’m off to get myself a IPv6 tunnel set up.