mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Combiner applied on multiple map task outputs (like in Mahout SVD)
Date Wed, 26 Sep 2012 13:29:10 GMT
On Wed, Sep 26, 2012 at 4:49 AM, Sigurd Spieckermann <
sigurd.spieckermann@gmail.com> wrote:

> Hi guys,
>
> I'm trying to understand the way the combiner in Mahout SVD works. (
> https://cwiki.apache.org/MAHOUT/dimensional-reduction.html) As far as I
> know from the Mahout math matrix-multiplication implementation, matrix A is
> represented by column-vectors, matrix B is represented by row vectors and
> an inner join executes an outer product of the columns of A with the rows
> of B. All outer products are summed by the combiners and reducers. What I
> am wondering about is how a combiner can actually combine multiple outer
> products on the same datanode because the join-package requires the data to
> be partitioned into unsplittable files. In this case, I understand that one
> file contains one column/row of its corresponding matrix. Hence, each map
> task receives a column-row-tuple, computes the outer product and emits the
> result.


This all sounds right, but not the following:


> My understanding of Hadoop is that the combiner follows a map task
> immediately but one map task produces only a single result so there is
> nothing to combine.


That part is not true - a mapper may emit more than one key-value pair (and
for
matrix multiplication, this is true *a fortiori* - there is one int/vector
pair emitted per
nonzero element of the row being mapped over).


> If the combiner could accumulate the results of
> multiple map task, I would understand the idea, but from my understanding
> and tests, it does not.
>
> Could anyone clarify the process please?
>
> Thanks a lot!
> Sigurd
>



-- 

  -jake

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message