But I guess if there's an evidence that 200 singular is better than
50, isn't that translated in stochastic world that 200+300 is perhaps
better enough than 50+90 so it's worth the effort?
On Wed, Apr 6, 2011 at 12:15 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>
>
> On Wed, Apr 6, 2011 at 11:47 AM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>>
>> But with LSI (which is what i use it for) they recommend to get at
>> least about 200 'good' values i think I read it? Just to fit all
>> possible 'soft clusters' which would be approximate but a lot of them
>> sticking in different directions?
>
> That is exactly my point. They analyzed the performance with 50, 100 and
> 200 singular vectors, but not between
> 20 singular + 180 random vectors.
> The random vectors stick out in different directions. The issue is whether
> the data can really tell you what good directions are. I think not.
>>
>> Disclaimer: i havent' analyzed yet the decay on sv's of our data, it
>> would certainly show how soon reasonable is reasonable. I think I saw
>> one of presentations of the authors of that paper where they show a
>> formula to estimate when \sigma_{n}\over\sigma_{n+1} is small enough
>> to be comparable to noise. It was one of ideas i had how to advise on
>> actually useful number of singular values produced, postrun .
>
> The decay of singular values actually tells you very little. If you were to
> analyze purely random text with
> the same word frequencies, you would see similar decay of singular values.
> All that the singular values
> tell you is how many singular values/vectors that are required to replicate
> the *training* data to a particular
> level of fidelity. They say nothing about how well you will be able to
> replicate unseen data and that is
> the only important question.
>
>
