mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sigurd Spieckermann <sigurd.spieckerm...@gmail.com>
Subject Re: Combiner applied on multiple map task outputs (like in Mahout SVD)
Date Wed, 26 Sep 2012 13:40:19 GMT
Well, my word selection wasn't great when I said "one map task produces
only a single result". The way I meant this was that one map task only
produces a single outer product (that consist of multiple column vectors
hence multiple mapper emits), but those are not the ones to combine in this
case, right?

2012/9/26 Sigurd Spieckermann <sigurd.spieckermann@gmail.com>

> Yes, but one int/vector pair corresponds to the respective column of A
> multiplied by an element of the respective row of B, correct? So the
> concatenation of the resulting columns would be outer product of the column
> of A and the row of B. None of these vectors are summed up but rather the
> outer products of multiple map tasks are summed up. So what is the job of
> the combiner here? It would be nice if the combiner could sum up all outer
> products computed on that datanode, but this is the part I can't see
> happening in Hadoop. Is the general statement correct that a combiner is
> only applied to all outputs of a *map task* and that a map task processes
> all key-value pairs of a split? In this case, there is only one key-value
> pair per split, right? The int/vector being index and column/row of the
> matrix.
>
>
> 2012/9/26 Jake Mannix <jake.mannix@gmail.com>
>
>> On Wed, Sep 26, 2012 at 4:49 AM, Sigurd Spieckermann <
>> sigurd.spieckermann@gmail.com> wrote:
>>
>> > Hi guys,
>> >
>> > I'm trying to understand the way the combiner in Mahout SVD works. (
>> > https://cwiki.apache.org/MAHOUT/dimensional-reduction.html) As far as I
>> > know from the Mahout math matrix-multiplication implementation, matrix
>> A is
>> > represented by column-vectors, matrix B is represented by row vectors
>> and
>> > an inner join executes an outer product of the columns of A with the
>> rows
>> > of B. All outer products are summed by the combiners and reducers. What
>> I
>> > am wondering about is how a combiner can actually combine multiple outer
>> > products on the same datanode because the join-package requires the
>> data to
>> > be partitioned into unsplittable files. In this case, I understand that
>> one
>> > file contains one column/row of its corresponding matrix. Hence, each
>> map
>> > task receives a column-row-tuple, computes the outer product and emits
>> the
>> > result.
>>
>>
>> This all sounds right, but not the following:
>>
>>
>> > My understanding of Hadoop is that the combiner follows a map task
>> > immediately but one map task produces only a single result so there is
>> > nothing to combine.
>>
>>
>> That part is not true - a mapper may emit more than one key-value pair
>> (and
>> for
>> matrix multiplication, this is true *a fortiori* - there is one int/vector
>> pair emitted per
>> nonzero element of the row being mapped over).
>>
>>
>> > If the combiner could accumulate the results of
>> > multiple map task, I would understand the idea, but from my
>> understanding
>> > and tests, it does not.
>> >
>> > Could anyone clarify the process please?
>> >
>> > Thanks a lot!
>> > Sigurd
>> >
>>
>>
>>
>> --
>>
>>   -jake
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message