mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Filimon <dangeorge.fili...@gmail.com>
Subject Re: Increase the number of mappers/split file? for matrixmult
Date Thu, 20 Jun 2013 07:30:16 GMT
Hi!

I don't know the particular details of this job, but usually  the number of
mappers being launched is a Hadoop problem. And Hadoop looks at the number
of input splits as its main hint.
So, if your matrices are split in multiple smaller files, you'll likely get
multiple mappers.

Since I assume your matrices are SequenceFiles, maybe try out this:
https://github.com/apache/mahout/blob/trunk/examples/src/main/java/org/apache/mahout/clustering/streaming/tools/ResplitSequenceFiles.java

This tool is called "resplit" and it should work for any Writables.
https://github.com/apache/mahout/blob/trunk/src/conf/driver.classes.default.props

See if resplitting works. :)


On Thu, Jun 20, 2013 at 9:18 AM, Rafa Alfaro <ralfaro2002@gmail.com> wrote:

> Hi,
>
> I'm trying to run the matrix multiplication of two relatively small
> (4219*200)(200*54622) but it is taking too long because only a single
> mapper is launched. I'm running this on a 10 node cluster.
>
> I have tried changing the MAHOUT_OPTS in the mahout file:
>
> MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.map.tasks=18"
> MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.reduce.tasks=9"
>
> Also passing the options directly on the command:
>
> mahout matrixmult -Dmapred.map.tasks=18 -Dmapred.reduce.tasks=9
> --numRowsA 200 --numColsA 4819 --numRowsB 200 --numColsB 54622
> --inputPathA matrixA --inputPathB matrixB
>
> But no luck with this either.
>
> My Hadoop mapred-site.xml looks like this:
>
> <configuration>
>   <property>
>     <name>mapred.job.tracker</name>
>     <value>serverX:54311</value>
>     <final>true</final>
>   </property>
>   <property>
>     <name>mapred.child.ulimit</name>
>     <value>unlimited</value>
>   </property>
>   <property>
>     <name>mapred.tasktracker.map.tasks.maximum</name>
>     <value>2</value>
>     <final>true</final>
>   </property>
>   <property>
>     <name>mapred.tasktracker.reduce.tasks.maximum</name>
>     <value>2</value>
>     <final>true</final>
>   </property>
>   <property>
>     <name>mapred.child.java.opts</name>
>     <value>-Xmx2000m</value>
>   </property>
> </configuration>
>
> Am I missing something on the configuration?
>
> Right now with 1 mapper it is taking 4 min in average to advance 1%
> with the mapper task.
>
> Thank you,
> Rafael Alfaro
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message