mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sears Merritt <sears.merr...@gmail.com>
Subject Re: mahout and hadoop configuration question
Date Fri, 03 Aug 2012 21:18:31 GMT
Aha! Thanks for catching that.

On Aug 3, 2012, at 3:13 PM, Sean Owen <srowen@gmail.com> wrote:

> Ah that's the ticket. The stack trace shows it is failing in the driver
> program, which runs client-side. It's not getting to launch a job.
> 
> It looks like it's running out of memory creating a new dense vector in the
> random seed generator process. I don't know anything more than that about
> why it happens, whether your input is funny, etc. but that is why it is not
> getting to Hadoop.
> 
> On Fri, Aug 3, 2012 at 5:04 PM, Sears Merritt <sears.merritt@gmail.com>wrote:
> 
>> Exactly. There isn't an error. The job just runs on a single machine and
>> eventually crashes when it exhausts the JVM's memory. I never see it show
>> up in the job tracker and never get any map-reduce status output. The full
>> output is here:
>> 
>> -bash-4.1$ bin/mahout kmeans -i /users/merritts/rvs -o
>> /users/merritts/kmeans_output -c /users/merritts/clusters -k 10000 -x 10
>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>> Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
>> HADOOP_CONF_DIR=/usr/lib/hadoop/conf
>> MAHOUT-JOB:
>> /home/merritts/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
>> 12/08/03 14:26:52 INFO common.AbstractJob: Command line arguments:
>> {--clusters=[/users/merritts/clusters], --convergenceDelta=[0.5],
>> --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
>> --endPhase=[2147483647], --input=[/users/merritts/rvs], --maxIter=[10],
>> --method=[mapreduce], --numClusters=[10000],
>> --output=[/users/merritts/kmeans_output], --startPhase=[0],
>> --tempDir=[temp]}
>> 12/08/03 14:26:52 INFO common.HadoopUtil: Deleting /users/merritts/clusters
>> 12/08/03 14:26:53 WARN util.NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new compressor
>> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
>> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
>> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
>> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>        at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:54)
>>        at org.apache.mahout.math.DenseVector.like(DenseVector.java:115)
>>        at org.apache.mahout.math.DenseVector.like(DenseVector.java:28)
>>        at
>> org.apache.mahout.math.AbstractVector.times(AbstractVector.java:478)
>>        at
>> org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:273)
>>        at
>> org.apache.mahout.clustering.AbstractCluster.observe(AbstractCluster.java:248)
>>        at
>> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:93)
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:94)
>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:48)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>        at
>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>        at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
>> 
>> 
>> 
>> 
>> On Aug 3, 2012, at 3:00 PM, Sean Owen <srowen@gmail.com> wrote:
>> 
>>> I don't see an error here...? the warning is an ignorable message from
>>> hadoop.
>>> 
>>> On Fri, Aug 3, 2012 at 4:56 PM, Sears Merritt <sears.merritt@gmail.com
>>> wrote:
>>> 
>>>> Hi All,
>>>> 
>>>> I'm trying to run a kmeans job using mahout 0.8 on my hadoop cluster
>>>> (Cloudera's 0.20.2-cdh3u3) and am running into an odd problem where the
>>>> mahout job connects to HDFS for reading/writing data but only runs
>> hadoop
>>>> on a single machine, not the entire cluster. To the best of my
>> knowledge I
>>>> have all the environment variables configured properly, as you will see
>>>> from the output below.
>>>> 
>>>> When I launch the job using the command line tools as follows:
>>>> 
>>>> bin/mahout kmeans -i /users/merritts/rvs -o
>> /users/merritts/kmeans_output
>>>> -c /users/merritts/clusters -k 100 -x 10
>>>> 
>>>> I get the following output:
>>>> 
>>>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>>>> Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
>>>> HADOOP_CONF_DIR=/usr/lib/hadoop/conf
>>>> MAHOUT-JOB:
>>>> 
>> /home/merritts/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
>>>> 12/08/03 14:26:52 INFO common.AbstractJob: Command line arguments:
>>>> {--clusters=[/users/merritts/clusters], --convergenceDelta=[0.5],
>>>> 
>> --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
>>>> --endPhase=[2147483647], --input=[/users/merritts/rvs], --maxIter=[10],
>>>> --method=[mapreduce], --numClusters=[10000],
>>>> --output=[/users/merritts/kmeans_output], --startPhase=[0],
>>>> --tempDir=[temp]}
>>>> 12/08/03 14:26:52 INFO common.HadoopUtil: Deleting
>> /users/merritts/clusters
>>>> 12/08/03 14:26:53 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop
>>>> library for your platform... using builtin-java classes where applicable
>>>> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new compressor
>>>> 12/08/03 14:26:53 INFO compress.CodecPool: Got brand-new decompressor
>>>> 
>>>> Has anyone run into this before? If so, how did you fix the issue?
>>>> 
>>>> Thanks for your time,
>>>> Sears Merritt
>> 
>> 


Mime
View raw message