mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kris Jack <mrkrisj...@gmail.com>
Subject Re: Setting Number of Mappers and Reducers in DistributedRowMatrix Jobs
Date Mon, 14 Jun 2010 15:52:27 GMT
Command line call is this -

hadoop-0.20 jar mahout-core-0.4-SNAPSHOT.job
org.apache.mahout.math.hadoop.GenSimMatrixJob
-Dmapred.input.dir=/user/kris/simMatrix/mahoutIndexTFIDF.vec
-Dmapred.map.tasks=8 -Dmapred.reduce.tasks=8 --tempDir
/tmp/matrixMulitiplication/ --numDocs 12843450 --numTerms 719050

org.apache.mahout.math.hadoop.GenSimMatrixJob is my own class that calls the
matrix transposition and then multiplication.  Is it maybe because I'm using
hadoop 0.20?

Kris



2010/6/14 Sean Owen <srowen@gmail.com>

> That's odd since those methods just set the exact same parameter to Hadoop:
>
>  public void setNumMapTasks(int n) { setInt("mapred.map.tasks", n); }
>
> It is indeed not read by anything except Hadoop.
>
> What's your command line? there must be some little glitch here that's
> making it not be set as expected. You should be able to set this in
> the command line, or Hadoop XML files, and it shouldn't impact the
> Mahout code either way.
>
>
>
> On Mon, Jun 14, 2010 at 3:39 PM, Kris Jack <mrkrisjack@gmail.com> wrote:
> > Hi Sean,
> >
> > Yes, I tried using those parameters but they didn't seem to have any
> > effect.  What's more, the number of reducers never increased above 1,
> > meaning that I never got to see any results when running with large data
> > sets (doing matrix multiplication).
> >
> > I looked in the code to find where these parameters were being read by
> the
> > jobs that I was using (i.e. MatrixMultiplicationJob and TransposeJob) but
> > couldn't find them.  As a result, I modified their builders and called
> the
> > setNumMapTasks and setNumReducerTasks functions from the conf objects.
>  This
> > now works from the command line using the parameters that you suggested.
> >
> > Please do let me know if I was just not calling them correctly or if you
> > think that there already exists an alternative way to do this.  I would
> like
> > to use Mahout as it was intended and not make lots of little changes
> myself
> > if they aren't necessary.
> >
> > Thanks,
> > Kris
> >
> >
> >
> > 2010/6/11 Sean Owen <srowen@gmail.com>
> >
> >> -Dmapred.map.tasks and same for reduce? These should be Hadoop params
> >> you set directly to Hadoop.
> >>
> >> On Fri, Jun 11, 2010 at 5:07 PM, Kris Jack <mrkrisjack@gmail.com>
> wrote:
> >> > Hi everyone,
> >> >
> >> > I am running code that uses some of the jobs defined in the
> >> > DistributedRowMatrix class and would like to know if I can define the
> >> number
> >> > of mappers and reducers that they use when running?  In particular,
> with
> >> the
> >> > jobs:
> >> >
> >> > - MatrixMultiplicationJob
> >> > - TransposeJob
> >> >
> >> > I am happy to comfortable with changing the code to get this to work
> but
> >> I
> >> > was wondering if the algorithmic logic being employed would allow
> >> multiple
> >> > mappers and reducers.
> >> >
> >> > Thanks,
> >> > Kris
> >> >
> >>
> >
> >
> >
> > --
> > Dr Kris Jack,
> > http://www.mendeley.com/profiles/kris-jack/
> >
>



-- 
Dr Kris Jack,
http://www.mendeley.com/profiles/kris-jack/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message