mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <>
Subject Re: SSVD Parameters Constrains
Date Sun, 07 Oct 2012 22:17:58 GMT
So, to summarize the answer to your question:

1) k+p <= min(m,n). (SVD rank deficiency nessary but not actually
sufficient requirement. Most strictly, rank(A)<=min(k+p) -- but this
more strict requirement will not cause the problem message you are
2) number of rows of A in _every mapper split_ of the input A  is at
least k+p. Note that in situation of multiple input files, this also
implies that number of rows in every input file is at least k+p.


On Sun, Oct 7, 2012 at 3:02 PM, Dmitriy Lyubimov <> wrote:
> PS. Also keep in mind that if you use multiple files as input, and
> their sizes are smaller than hdfs split size, it will also mean that
> some splits will have reduced size even if the total input size looks
> benign. I think there was at least one case (which also pertains to
> "problem too smail" case) where a user discovered that one of the
> input files of distributed row matrix had less than k+p rows hence
> setting off this block height deficiency problem.
> -d
> On Sun, Oct 7, 2012 at 2:55 PM, Dmitriy Lyubimov <> wrote:
>> Ahmed, if you are getting this, in all cases people talked about it it
>> meant their problem was too small.  If A has m x n geometry, then it
>> must be true that k+p<=min(m,n).
>> Another possible reason is if height of blocks of A crerated in the
>> mappers are less than k+p. In practice we yet to see a problem that
>> actually may ever run into condition (although it is definitely
>> possible if you occasionally have very dense very large row vectors so
>> they take up enough space to create split block height problem). If it
>> is indeed split block height problem, then the remedy is to increase
>> split size either by hadoop parameter or (i think) one of the SSVD
>> command line parameters. Although like i said nobody yet ran into
>> block height deficiency problem yet so i have no knowledge of verified
>> resolution of this problem by means of manipulating hadoop parameter
>> setup in Mahout.
>> -d
>> On Sun, Oct 7, 2012 at 2:16 PM, Ahmed Elgohary <> wrote:
>>> Hi,
>>> Can someone list all the constrains on the parameters (k,p &aBlockRows)
>>> that should be satisfied in order for the Q-job in ssvd to work fine? I
>>> tried many values, made sure that (k+p<=m & k+p<=n & p is in the
range 20
>>> .. 200). but I am still getting the errors: "Givens thin QR: must be true:
>>> m>=n" or ""new m can't be less than n".
>>> thanks,
>>> --ahmed

View raw message