mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vikas Kumar <kumar...@umn.edu>
Subject Re: How to change /tmp directory for mahout usage of map-reduce?
Date Wed, 01 Apr 2015 07:41:54 GMT
Thanks for the reply.

I think I have found the parameter that was required to set. The
configuration object is required to set the parameter *"mapred.local.dir"*
which is used by
*org.apache.hadoop.filecache.TrackerDistributedCacheManager.*

conf.set("mapred.local.dir", "tmpDirectory");

It is working as expected for me now.








On Wed, Apr 1, 2015 at 2:27 AM, Suneel Marthi <suneel.marthi@gmail.com>
wrote:

> You need to set the temp path in ur Configuration and pass the
> Configuration object to the subsequent calls.
>
> IIRC, Spectral KMeans internally calls other MapReduce jobs like
> MatrixDiagnolizeJob, VectorMatrixMultiplicationJob, SSVD.
> So ensure that you are passing common parameters like tempDir, outputDir
> etc via Configuration across the jobs.
>
> Shannon could help better here.
>
> On Wed, Apr 1, 2015 at 3:21 AM, Vikas Kumar <kumar093@umn.edu> wrote:
>
> > Sorry, it didn't solved the problem.
> >
> > What it changed was the *tmp* directory for the following (taken from the
> > log attached above):
> > 15/04/01 01:18:13 INFO mapred.MapTask: Processing split:
> > file:/export/scratch/vikas/<<<<PRIVATE DIRECTORIES>>>>>
> > /tmp/calculations/seqfile/part-r-00000:0+86000
> >
> > However, the *tmp* directory for TrackerDistributedCacheManager is still
> > the same:
> >
> > 15/04/01 01:18:13 INFO filecache.TrackerDistributedCacheManager: Creating
> > vector in */tmp/hadoop-vikas/mapred/local*/archive/-623590149816891030_-
> > 1428839080_1939951392/file/export/scratch/vikas/<<<<PRIVATE
> > DIRECTORIES>>>>>/tmp/calculations-work--3390146237769593830 with
> rwxr-xr-x
> >
> > It seems like I just require to set the right resource (Path or string)
> in
> > the Configuration object passed as the parameter of the Spectral
> > Clustering. But not able to figure out which one.
> >
> > Thanks
> >
> >
> >
> >
> >
> >
> > On Wed, Apr 1, 2015 at 1:43 AM, Vikas Kumar <kumar093@umn.edu> wrote:
> >
> > > That was helpful to figure out what was required.
> > > I had to set the right path for variable *tmp* in the function from :
> > >
> > > Path tmp = new Path("tmp")
> > >
> > > to
> > >
> > > Path tmp = new Path("<<CHOSEN DIRECTORY>>");
> > >
> > > Silly mistake. Thanks for the clue :)
> > >
> > > -Vikas
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Apr 1, 2015 at 1:34 AM, Suneel Marthi <suneel.marthi@gmail.com
> >
> > > wrote:
> > >
> > >> If u running Spectral KMeans via Command Line, u should be able to set
> > the
> > >> parameter -tempDir to point to a different path
> > >>
> > >> On Wed, Apr 1, 2015 at 1:55 AM, Andrew Musselman <
> > >> andrew.musselman@gmail.com
> > >> > wrote:
> > >>
> > >> > Can you let us know which code/scripts you're using?
> > >> >
> > >> > On Tuesday, March 31, 2015, Vikas Kumar <kumar093@umn.edu> wrote:
> > >> >
> > >> > > Hello,
> > >> > >
> > >> > > I am using Mahout Spectral clustering example which internally
> > calls a
> > >> > map
> > >> > > reduce job. Right now, it is using
> > */tmp/hadoop-<username>/mapred/..*
> > >> > > directory by default for its operations.
> > >> > >
> > >> > > Can someone please let me know how to make mahout to use a
> different
> > >> > path?
> > >> > >
> > >> > > Thanks
> > >> > > Vikas
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message