The Lair

Do not meddle in the affairs of dragons, for you are crunchy and taste good with ketchup

Archive for December, 2007

obscure archiving

December 29th, 2007

A good while ago (maybe six months or longer), I decided to write a small logminer bot. Nothing very special, most likely a solved problem; you’d contend. I’d agree. Parsing webserver logs is like constructing a text editor in 2007, most often a useless reinvention of something that has already been done to death. There are log mining programs aplenty. The world really doesn’t need any more. I could do something more productive, like sleep. Watch television. Read more comics. Or (at the time), write a thesis.

Yes, you’d say that. I know I certainly thought it. Yet, as these things transpire – it turned out that I did actually need this bot. Some logminers (hell, most of them) give statistical overviews of website visits. Problem status? solved. Other bots (like fail2ban or denyhosts) are used for security purposes. My bot? Somewhere in the middle. What I wanted to do was present myself with an overview of visits and (this is the interesting part); have the bot do some trivial cross checking if the access was legit. And by legit, I mean not a spambot and not from an open proxy.

By and large, the bot works reasonably well to sift out the most suspicious looking IPs from the scores of legitimate accesses. But a consequence of having the bot running is that I tend to get a fairly detailed, comprehensive summarized list of the most interesting accesses on a given day – something I wouldn’t have bothered to check otherwise. Of a few days ago – this included some visits via archive.org – the internet archiver.

Which leads seamlessly onto the earliest sightings for El Goog here. And although I didn’t really stop to work this one out till now – the internet archive is a place of fascinating discoveries. For example, Last Man on Earth (an old skool version of the fairly terrible I Am Legend). Lots more feature films (including Beat The Devil, which I haven’t seen for a while). Probably best to start at the movies and films index.

It’s like Wikipedia in fascinating discoveries and potential to waste time – but without the flamewars on every Talk: page. Interesting stuff.

plus ca change

December 28th, 2007

Yes, that’s my pretence that I actually understand the French language – either written or spoken. I don’t.

So, I’ve been pottering around getting myself organized for le grande server move. Le grande server, where I have tons of flexibility – but at the price of having to manage most of the services myself. Yes, that one. I hope to depart from my coddled existence at the present host; where backups and server monitoring and all those things are taken care of by professionals – and strike out on my own.

What I’ve discovered is that having done more or less the same thing for any number of corporate entities actually means nothing when it’s your own personal little playpen. For one thing, since it’s my own hobbyist setup – I feel inclined, nay compelled, to push the envelope with new and strange looking configs. All of this leads to a predictably steep learning curve, since I can’t rely on muscle memory and a few years of experience to guide me through a familiar configuration file or three. When some problem crops up (as it inevitably does) I have to peruse random websites, rely on automated translations from languages with non-Latin alphabets (yes. really) and do all manner of frantic sacrifices to the Patron Saint of Setting Up Servers., It’s a strange feeling to be this ignorant and n00bish about the mundane business of setting up a webserver. I’m quite enjoying the experience.

If the webserver on this hobby setup goes down, I’ll probably get a few concerned emails (assuming anyone even notices). If a server at $work had gone down in similar circumstances; it would have probably meant a fair amount of lost revenue or whatever the accounting types use to indicate a very bad thing. Yet strangely, the effort put into making the server as solid as possible is pretty much the same.

Read the rest of this entry »

nothing seems to be happening

December 27th, 2007

No. I kid. There’s actually too much happening and I can’t keep track. First and foremost, there is Benazir Bhutto. Yeah, it was probably suicidal to come back. She did anyway.

Then there is the slightly more entertaining (in a utter tosser getting his comeuppance sort of way) saga of a guy who’s name rhymes with Erwin. Because there is a sense of justice in someone trying to mete out an extra-judicial vigilante ass kicking and having the ass kicking rebound on them, right?

And to complete my descent from the ultra serious to the utterly innocuous – seven common medical myths debunked. Including that perennial favourite (which I intend to forward to my parents) – no, reading in dim light doesn’t actually harm eyesight.

parsing URLs

December 19th, 2007

There are lots of things that you’d want computers to simplify for you – but the obvious methods for simplification don’t actually work. For example, finding out if a user entered email is actually valid. The description and code involved (see here and the monstrous chunk of code here) belies the apparent simplicity of the task.

So it was for my own little task. What I wanted to do was write a general purpose method (in PHP, actually – but the language itself is unimportant) to infer the blog address, given a permalink. So, given any post URL – I wanted to find the address of the blog itself. You’d think that this was a relatively straightforward task. But, if the preamble didn’t alert you already, it wasn’t quite as simple as I first envisaged.

Read the rest of this entry »

incomplete

December 13th, 2007

So, I was thinking recently. Uncommon, I know. Savour the moment while it lasts. I realized in a blinding flash of the obvious, that “pass”, or more specifically “pass on” is an ambiguous construct. In fact, there are at least two completely different ideas that can be expressed with that phrase. There is a third construct, as tez points out – pass on is the formal phrase to use for kicking the bucket. But to concentrate on the Wordnet senses –

Pass on

  1. To give, impart“They pass on the parcel to their parents”
  2. To relegate, defer or decline“He passed on the unpalatable choices on offer”.

Why does this lingusitic oddity suddenly interest me? Well, it is remarkably like the famous Dinosaur comic on homographic homophonic autantonyms. Actually, even more interesting is the list of words defined as contronyms (same word, opposite meanings). via LL

Why this sudden interest in the unparseable? Because if you are trying to cajole a computer into understanding these constructs as they appear in written text, you need to figure out the context in which the words are being used. Specifically, you’d need to disambiguate the word – and, as those examples indicate, this is more difficult than you’d expect.

Read the rest of this entry »