mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yazan Boshmaf <bosh...@ece.ubc.ca>
Subject Re: Submitting mahout jobs to map/reduce cluster with fair scheduling
Date Fri, 09 Nov 2012 05:10:25 GMT
Hi Jeff,

I tried running:

$MAHOUT_HOME/bin/mahout
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job -t1 0.1 -t2
0.00001 -x -Dmapred.input.dir=testdata -Dmapred.output.dir=output
-Dmapred.fairscheduler.pool=my_group.my_pool

But i still endup with the same error. The other arguments are parsed as
shown by

12/11/08 21:00:38 INFO kmeans.Job: Running with only user-supplied arguments
12/11/08 21:00:38 INFO common.AbstractJob: Command line arguments:
{--convergenceDelta=[0.5],
--distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
--endPhase=[2147483647], --maxIter=[-1], --startPhase=[0], --t1=[0.1],
--t2=[0.00001], --tempDir=[temp]}
12/11/08 21:00:38 INFO kmeans.Job: Preparing Input

And the job gets a session

12/11/08 21:00:39 INFO corona.SessionDriver: Got session ID
201211051809.443899

Then there is this interesting warning for the generic options (which
includes the -D for the JobClient)

12/11/08 21:00:39 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.

Interestingly, the HFS input/output argument are correctly parsed, as shown
by

12/11/08 21:00:40 INFO FileSystem.collect: makeAbsolute: output/data
working directory: hdfs://my_cluster:my_port/absolute_path
12/11/08 21:00:40 INFO input.FileInputFormat: Total input paths to process
: 1

But I still get

12/11/08 21:00:43 ERROR mapred.CoronaJobTracker: UNCAUGHT: Thread main got
an uncaught exception
java.io.IOException: InvalidSessionHandle(handle:This cluster is operating
in configured pools only mode.  The pool group and pool was specified as
'default.defaultpool' and is not part of this cluster.  Please use the
Corona parameter mapred.fairscheduler.pool to set a valid pool group and
pool in the format <poolgroup>.<pool>)
at
org.apache.hadoop.corona.SessionDriver.startSession(SessionDriver.java:275)
...

And thoughts on this?

Regards,
Yazan



On Thu, Nov 8, 2012 at 5:11 PM, Jeff Eastman <jdog@windwardsolutions.com>wrote:

> That Job extends org.apache.mahout.common.**AbstractJob, so it probably
> will accept a -D argument to set "mapred.fairscheduler.pool=...**" . Have
> you tried this?
>
>
>
> On 11/8/12 3:41 PM, Yazan Boshmaf wrote:
>
>> Hello,
>>
>> I'm trying to run the ASF Email example here:
>> https://cwiki.apache.org/**confluence/display/MAHOUT/**ASFEmail<https://cwiki.apache.org/confluence/display/MAHOUT/ASFEmail>
>>
>> I am using an existing Hive/Hadoop cluster.
>>
>> When I run:
>>
>> $MAHOUT_HOME/bin/mahout
>> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job
>>
>> I get:
>>
>> MAHOUT-JOB:
>> /usr/local/mahout-0.8/trunk/**examples/target/mahout-**
>> examples-0.8-SNAPSHOT-job.jar
>> 12/11/08 12:13:54 WARN driver.MahoutDriver: No
>> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job.**props found
>> on
>> classpath, will use command-line arguments only
>> 12/11/08 12:13:54 INFO kmeans.Job: Running with default arguments
>> 12/11/08 12:13:55 INFO FileSystem.collect: makeAbsolute: output working
>> directory: hdfs://my_cluster:my_port/
>> 12/11/08 12:13:55 INFO kmeans.Job: Preparing Input
>> 12/11/08 12:13:55 INFO FileSystem.collect: make Qualify non absolute path:
>> testdata working directory: dfs://cluster:port_num/
>> 12/11/08 12:13:55 INFO corona.SessionDriver: My serverSocketPort port_num
>> 12/11/08 12:13:55 INFO corona.SessionDriver: My Address ip_addrs:port_num
>> 12/11/08 12:13:55 INFO corona.SessionDriver: Connecting to cluster manager
>> at data_manager:port_num
>> 12/11/08 12:13:55 INFO corona.SessionDriver: Got session ID
>> 201211051809.387193
>> 12/11/08 12:13:55 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 12/11/08 12:13:56 INFO FileSystem.collect: makeAbsolute: output/data
>> working directory: dfs://cluster:port_num/
>> 12/11/08 12:13:56 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 12/11/08 12:13:56 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
>> 12/11/08 12:13:56 INFO lzo.LzoCodec: Successfully loaded & initialized
>> native-lzo library [hadoop-lzo rev fatal: Not a git repository (or any of
>> the parent directories): .git]
>> 12/11/08 12:13:57 ERROR mapred.CoronaJobTracker: UNCAUGHT: Thread main got
>> an uncaught exception
>> java.io.IOException: InvalidSessionHandle(handle:**This cluster is
>> operating
>> in configured pools only mode.  The pool group and pool was specified as
>> 'default.defaultpool' and is not part of this cluster.  Please use the
>> Corona parameter mapred.fairscheduler.pool to set a valid pool group and
>> pool in the format <poolgroup>.<pool>)
>> at
>> org.apache.hadoop.corona.**SessionDriver.startSession(**
>> SessionDriver.java:275)
>> at
>> org.apache.hadoop.mapred.**CoronaJobTracker.**startFullTracker(**
>> CoronaJobTracker.java:670)
>> at
>> org.apache.hadoop.mapred.**CoronaJobTracker.submitJob(**
>> CoronaJobTracker.java:1898)
>> at org.apache.hadoop.mapred.**JobClient.submitJobInternal(**
>> JobClient.java:1259)
>> at org.apache.hadoop.mapreduce.**Job.submit(Job.java:459)
>> at org.apache.hadoop.mapreduce.**Job.waitForCompletion(Job.**java:474)
>> at
>> org.apache.mahout.clustering.**conversion.InputDriver.runJob(**
>> InputDriver.java:108)
>> at
>> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job.**
>> run(Job.java:129)
>> at
>> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job.**
>> main(Job.java:59)
>> at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native Method)
>> at
>> sun.reflect.**NativeMethodAccessorImpl.**invoke(**
>> NativeMethodAccessorImpl.java:**39)
>> at
>> sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
>> DelegatingMethodAccessorImpl.**java:25)
>> at java.lang.reflect.Method.**invoke(Method.java:597)
>> at
>> org.apache.hadoop.util.**ProgramDriver$**ProgramDescription.invoke(**
>> ProgramDriver.java:68)
>> at org.apache.hadoop.util.**ProgramDriver.driver(**
>> ProgramDriver.java:139)
>> at org.apache.mahout.driver.**MahoutDriver.main(**MahoutDriver.java:195)
>> at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native Method)
>> at
>> sun.reflect.**NativeMethodAccessorImpl.**invoke(**
>> NativeMethodAccessorImpl.java:**39)
>> at
>> sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
>> DelegatingMethodAccessorImpl.**java:25)
>> at java.lang.reflect.Method.**invoke(Method.java:597)
>> at org.apache.hadoop.util.RunJar.**main(RunJar.java:156)
>>
>> My question is: How do I configure Mahout to use pools? That is, where do
>> I
>> set the Corona "mapred.fairscheduler.pool" JobConf?
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message