<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: will no one rid me of that troublesome &#8230;</title>
	<atom:link href="http://lair.fierydragon.org/2006/01/will-no-one-rid-me-of-that-troublesome/feed/" rel="self" type="application/rss+xml" />
	<link>http://lair.fierydragon.org/2006/01/will-no-one-rid-me-of-that-troublesome/</link>
	<description>Do not meddle in the affairs of dragons, for you are crunchy and taste good with ketchup</description>
	<lastBuildDate>Tue, 10 Jan 2012 01:55:08 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: spizkapa</title>
		<link>http://lair.fierydragon.org/2006/01/will-no-one-rid-me-of-that-troublesome/comment-page-1/#comment-278</link>
		<dc:creator>spizkapa</dc:creator>
		<pubDate>Mon, 30 Jan 2006 15:00:19 +0000</pubDate>
		<guid isPermaLink="false">http://lair.fierydragon.org/2006/01/will-no-one-rid-me-of-that-troublesome/#comment-278</guid>
		<description>How about telling the statistician to go screw himself? Usually does it for me... Otherwise, log log is the way as far as I can see.</description>
		<content:encoded><![CDATA[<p>How about telling the statistician to go screw himself? Usually does it for me&#8230; Otherwise, log log is the way as far as I can see.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: drac</title>
		<link>http://lair.fierydragon.org/2006/01/will-no-one-rid-me-of-that-troublesome/comment-page-1/#comment-277</link>
		<dc:creator>drac</dc:creator>
		<pubDate>Mon, 30 Jan 2006 14:47:44 +0000</pubDate>
		<guid isPermaLink="false">http://lair.fierydragon.org/2006/01/will-no-one-rid-me-of-that-troublesome/#comment-277</guid>
		<description>Heh. Damn, you&#039;re fast :) I just updated the entry with the thought of dismissing the larger numbers as being outliers, as you suggest.

I probably shouldn&#039;t though. This is word similarity I&#039;m trying to compare - so the large values do mean something, in a sense... as any given word will have a few synonyms (large numbers) and lots of completely unrelated words (small numbers). The large numbers mean something, even if they don&#039;t occur very often.

I actually went this far:
&lt;a href=&quot;http://www.google.com/search?hl=en&amp;lr=&amp;q=sqrt+sqrt+sqrt+sqrt+0.0453309156844968&amp;btnG=Search&quot; rel=&quot;nofollow&quot;&gt;minimum number&lt;/a&gt;
and 
&lt;a href=&quot;http://www.google.com/search?hl=en&amp;lr=&amp;q=sqrt+sqrt+sqrt+sqrt+29590099.0108385&amp;btnG=Search&quot; rel=&quot;nofollow&quot;&gt;maximum number&lt;/a&gt;.

Of course, I was just applying sqrt till I got to a workable range - but it doesn&#039;t really conform to a nice theory. I think loglog is probably less surprising to anyone reading the material though, so I&#039;ll use that instead. thanks!

Incidentally, I used &lt;i&gt;exactly&lt;/i&gt; your suggestion of calling the larger numbers 1s - but a reviewer (some statistician ?) wasn&#039;t happy with that approximation. That&#039;s why I was searching for a more &quot;standard&quot; method of normalization for the revision.</description>
		<content:encoded><![CDATA[<p>Heh. Damn, you&#8217;re fast <img src='http://lair.fierydragon.org/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  I just updated the entry with the thought of dismissing the larger numbers as being outliers, as you suggest.</p>
<p>I probably shouldn&#8217;t though. This is word similarity I&#8217;m trying to compare &#8211; so the large values do mean something, in a sense&#8230; as any given word will have a few synonyms (large numbers) and lots of completely unrelated words (small numbers). The large numbers mean something, even if they don&#8217;t occur very often.</p>
<p>I actually went this far:<br />
<a href="http://www.google.com/search?hl=en&#038;lr=&#038;q=sqrt+sqrt+sqrt+sqrt+0.0453309156844968&#038;btnG=Search" rel="nofollow">minimum number</a><br />
and<br />
<a href="http://www.google.com/search?hl=en&#038;lr=&#038;q=sqrt+sqrt+sqrt+sqrt+29590099.0108385&#038;btnG=Search" rel="nofollow">maximum number</a>.</p>
<p>Of course, I was just applying sqrt till I got to a workable range &#8211; but it doesn&#8217;t really conform to a nice theory. I think loglog is probably less surprising to anyone reading the material though, so I&#8217;ll use that instead. thanks!</p>
<p>Incidentally, I used <i>exactly</i> your suggestion of calling the larger numbers 1s &#8211; but a reviewer (some statistician ?) wasn&#8217;t happy with that approximation. That&#8217;s why I was searching for a more &#8220;standard&#8221; method of normalization for the revision.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: spizkapa</title>
		<link>http://lair.fierydragon.org/2006/01/will-no-one-rid-me-of-that-troublesome/comment-page-1/#comment-276</link>
		<dc:creator>spizkapa</dc:creator>
		<pubDate>Mon, 30 Jan 2006 14:27:39 +0000</pubDate>
		<guid isPermaLink="false">http://lair.fierydragon.org/2006/01/will-no-one-rid-me-of-that-troublesome/#comment-276</guid>
		<description>You need to ask yourself: are all the numbers necessary?

What I mean is that, if the numbers that are huge are just outliers, you can simply filter the list to keep only those that you want. This may sound like cooking the data, but it isn&#039;t. You&#039;re trying to show what&#039;s going on, not make conclusions. You obviously need to say that you&#039;ve done this in the surrounding text.

If they are necessary (could indeed be the whole result - look, I get huge numbers when these guys get small numbers) then you can still normalise the rest in a standard way and simply call these huge numbers 1s. I know, it&#039;s not clean.

Other solutions include using a log log plot. It does distort the data but there isn&#039;t a much better way that I know. HTH.</description>
		<content:encoded><![CDATA[<p>You need to ask yourself: are all the numbers necessary?</p>
<p>What I mean is that, if the numbers that are huge are just outliers, you can simply filter the list to keep only those that you want. This may sound like cooking the data, but it isn&#8217;t. You&#8217;re trying to show what&#8217;s going on, not make conclusions. You obviously need to say that you&#8217;ve done this in the surrounding text.</p>
<p>If they are necessary (could indeed be the whole result &#8211; look, I get huge numbers when these guys get small numbers) then you can still normalise the rest in a standard way and simply call these huge numbers 1s. I know, it&#8217;s not clean.</p>
<p>Other solutions include using a log log plot. It does distort the data but there isn&#8217;t a much better way that I know. HTH.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

