The Lair

Do not meddle in the affairs of dragons, for you are crunchy and taste good with ketchup

obscure archiving

A good while ago (maybe six months or longer), I decided to write a small logminer bot. Nothing very special, most likely a solved problem; you’d contend. I’d agree. Parsing webserver logs is like constructing a text editor in 2007, most often a useless reinvention of something that has already been done to death. There are log mining programs aplenty. The world really doesn’t need any more. I could do something more productive, like sleep. Watch television. Read more comics. Or (at the time), write a thesis.

Yes, you’d say that. I know I certainly thought it. Yet, as these things transpire - it turned out that I did actually need this bot. Some logminers (hell, most of them) give statistical overviews of website visits. Problem status? solved. Other bots (like fail2ban or denyhosts) are used for security purposes. My bot? Somewhere in the middle. What I wanted to do was present myself with an overview of visits and (this is the interesting part); have the bot do some trivial cross checking if the access was legit. And by legit, I mean not a spambot and not from an open proxy.

By and large, the bot works reasonably well to sift out the most suspicious looking IPs from the scores of legitimate accesses. But a consequence of having the bot running is that I tend to get a fairly detailed, comprehensive summarized list of the most interesting accesses on a given day - something I wouldn’t have bothered to check otherwise. Of a few days ago - this included some visits via archive.org - the internet archiver.

Which leads seamlessly onto the earliest sightings for El Goog here. And although I didn’t really stop to work this one out till now - the internet archive is a place of fascinating discoveries. For example, Last Man on Earth (an old skool version of the fairly terrible I Am Legend). Lots more feature films (including Beat The Devil, which I haven’t seen for a while). Probably best to start at the movies and films index.

It’s like Wikipedia in fascinating discoveries and potential to waste time - but without the flamewars on every Talk: page. Interesting stuff.

Just say it

*Required
*Required (This site supports gravatars)