Tedd,
L^1/L^2 Normalization sounds like a good solution. I will try it out
and report the results.
Is there any literature available comparison of these normalization techniques?
Thank you.
shashi
On Thu, May 28, 2009 at 12:30 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> Shashi,
>
> You are correct that this can be a problem, especially with vectors that
> have a large number of elements that are zero, but not known to be such.
>
> The definition as it stands is roughly an L^0 normalization. It is more
> common in clustering to use an L^1 or L^2 normalization. This would divide
> the terms by, respectively, the sum of the elements or the square root of
> the sum of the squares of the elements. Both L^1 and L^2 normalization
> avoids the problem you mention since negligibly small elements will not
> contribute significantly to the norm.
>
> Traditionally, L^2 norms are used with documents. This dates back to Salton
> and the termvector model of text retrieval. That practice was, however,
> based on somewhat inappropriate geometric intuitions. Other norms are quite
> plausibly more appropriate. For instance, if normalized term frequencies
> are considered to be estimates of word generation probabilities, then the
> L^1 norm is much more appropriate.
>
> On Wed, May 27, 2009 at 11:52 PM, Shashikant Kore <shashikant@gmail.com>wrote:
>
>> ...
>> My concern in the following code is that the total is divided by
>> numPoints. For a term, only few of the numPoints vectors have
>> contributed towards the weight. Rest had the value set to zero. That
>> drags down the average and it much more pronounced in a large set of
>> sparse vectors.
>>
>>
>
