Hi all
I still want to confirm that this is not a problem.
Especially the n value, I just hope it is not problematic...
I discussed this in my lab, one of our members noted that the dimension of
feature vectors and the number of vectors I used were very different.
I have used 100 dimensions of vector and 600,000 vectors.
Do you think it may cause some problems if I use both small dimensions and
large number of vectors simultaneously and we need to make sure that there
is relation between them (especially in number)?
Or do you think 100 is too small for the dimension?
I will appreciate very much that someone follows my question.
Regards.
2012/8/4 Yuji NISHIDA@UTokyo <nishidyatutokyo@gmail.com>:
> Dear all
>
> I am working on mahout to use canopy and kmeans and got a problem
> about clusterdump output.
> Each vector has simple number incremented from 1 as its name.
>
> When I used 5,000 vectors, I got a correct output. It looks like:
>
> VL0{n=64,c=[...], r[...]}
> 1.0: 1= [...]
> 1.0: 3= [...]
> 1.0: 4= [...]
> ...
> 1.0: 396= [...] # The number of vectors is exactly same as n(64).
> VL1{n=5,c=[...], r[...]}
> 1.0: 2= [...]
> 1.0: 12= [...]
> ...
> 1.0: 4221= [...]
> VL2{n=121,c=[...], r[...]}
> ...
>
> Each number of vectors in VL is exactly same as its n value.
>
> When I used 600,000 vectors, the output looks wrong like:
>
> VL0{n=14,c=[...], r[...]}
> 1.0: 66636= [...]
> 1.0: 122570= [...]
> ...
> 1.0: 522794= [...] # The number of vectors is 31.
> VL8{n=0,c=[...], r[...]}
> 1.0: 393539= [...]
> 1.0: 398877= [...]
> ...
> 1.0: 513448= [...] # The number of vectors is 5.
> VL16{n=2,c=[...], r[...]}
> ...
>
> It looks VL1 to VL7 and VL9 to VL15 are not used but I confirmed
> them existing in the output.
> It seems using VL in order as 0,8,16,...,11552, 1,9,17,...,11553,
> 2,10,18... and so on.
>
> Can I believe this result or should I doubt this is caused by some bugs?
>
> Hadoop : 0.20.204
> Mahout : rev. 1351561, 1366995, 1367871
>
> Best regards.
>
> 
> nishidy@utokyo

nishidy@utokyo
