mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: SSVD Parameters Constrains
Date Sun, 07 Oct 2012 22:18:59 GMT
in 1), i meant rank(A)<=k+p of course.

On Sun, Oct 7, 2012 at 3:17 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> So, to summarize the answer to your question:
>
> 1) k+p <= min(m,n). (SVD rank deficiency nessary but not actually
> sufficient requirement. Most strictly, rank(A)<=min(k+p) -- but this
> more strict requirement will not cause the problem message you are
> seeing).
> 2) number of rows of A in _every mapper split_ of the input A  is at
> least k+p. Note that in situation of multiple input files, this also
> implies that number of rows in every input file is at least k+p.
>
> -d
>
> On Sun, Oct 7, 2012 at 3:02 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>> PS. Also keep in mind that if you use multiple files as input, and
>> their sizes are smaller than hdfs split size, it will also mean that
>> some splits will have reduced size even if the total input size looks
>> benign. I think there was at least one case (which also pertains to
>> "problem too smail" case) where a user discovered that one of the
>> input files of distributed row matrix had less than k+p rows hence
>> setting off this block height deficiency problem.
>>
>> -d
>>
>> On Sun, Oct 7, 2012 at 2:55 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>>> Ahmed, if you are getting this, in all cases people talked about it it
>>> meant their problem was too small.  If A has m x n geometry, then it
>>> must be true that k+p<=min(m,n).
>>>
>>> Another possible reason is if height of blocks of A crerated in the
>>> mappers are less than k+p. In practice we yet to see a problem that
>>> actually may ever run into condition (although it is definitely
>>> possible if you occasionally have very dense very large row vectors so
>>> they take up enough space to create split block height problem). If it
>>> is indeed split block height problem, then the remedy is to increase
>>> split size either by hadoop parameter or (i think) one of the SSVD
>>> command line parameters. Although like i said nobody yet ran into
>>> block height deficiency problem yet so i have no knowledge of verified
>>> resolution of this problem by means of manipulating hadoop parameter
>>> setup in Mahout.
>>>
>>> -d
>>>
>>> On Sun, Oct 7, 2012 at 2:16 PM, Ahmed Elgohary <aagohary@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> Can someone list all the constrains on the parameters (k,p &aBlockRows)
>>>> that should be satisfied in order for the Q-job in ssvd to work fine? I
>>>> tried many values, made sure that (k+p<=m & k+p<=n & p is in
the range 20
>>>> .. 200). but I am still getting the errors: "Givens thin QR: must be true:
>>>> m>=n" or ""new m can't be less than n".
>>>>
>>>> thanks,
>>>>
>>>> --ahmed

Mime
View raw message