mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Musselman <andrew.mussel...@gmail.com>
Subject Re: Advise needed for Mahout heap size allocation (seq2sparse failure)
Date Wed, 17 Dec 2014 18:57:30 GMT
But also please update to Mahout version 0.9 since you're two versions
behind.

On Wed, Dec 17, 2014 at 10:55 AM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:
>
> It's worth trying to increase the heap size for child JVMs per this doc,
> depending on what version you're running:
> http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/ClusterSetup.html
>
> On Tue, Dec 16, 2014 at 11:33 PM, 万代豊 <20525entradero@gmail.com> wrote:
>>
>> Hi
>> After my several successful jobs experiences on other Mahout Kmeans
>> calculation in the past , I'm facing a sudden heap error as below in
>> Mahout
>> seq2sparse process.(Mahout-0.70 on Hadoop-0.20.203 Pseudo-distributed)
>>
>> [hadoop@localhost TEST]$ $MAHOUT_HOME/bin/mahout seq2sparse --namedVector
>> -i TEST/TEST-seqfile/ -o TEST/TEST-namedVector -ow -a
>> org.apache.lucene.analysis.WhitespaceAnalyzer -chunk 200 -wt tfidf -s 5
>> -md
>> 3 -x 90 -ml 50 -seq -n 2
>> Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
>> MAHOUT-JOB: /usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar
>> 14/12/16 22:52:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum
>> n-gram size is: 1
>> 14/12/16 22:52:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum
>> LLR value: 50.0
>> 14/12/16 22:52:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Number
>> of
>> reduce tasks: 1
>> 14/12/16 22:52:57 INFO input.FileInputFormat: Total input paths to process
>> : 10
>> 14/12/16 22:52:57 INFO mapred.JobClient: Running job:
>> job_201412162229_0005
>> 14/12/16 22:52:58 INFO mapred.JobClient:  map 0% reduce 0%
>> 14/12/16 22:53:27 INFO mapred.JobClient: Task Id :
>> attempt_201412162229_0005_m_000000_0, Status : FAILED
>> Error: Java heap space
>> 14/12/16 22:53:29 INFO mapred.JobClient: Task Id :
>> attempt_201412162229_0005_m_000001_0, Status : FAILED
>> Error: Java heap space
>> 14/12/16 22:53:40 INFO mapred.JobClient:  map 2% reduce 0%
>> 14/12/16 22:53:42 INFO mapred.JobClient: Task Id :
>> attempt_201412162229_0005_m_000000_1, Status : FAILED
>> Error: Java heap space
>> attempt_201412162229_0005_m_000000_1: log4j:WARN No appenders could be
>> found for logger (org.apache.hadoop.mapred.Task).
>> attempt_201412162229_0005_m_000000_1: log4j:WARN Please initialize the
>> log4j system properly.
>> 14/12/16 22:53:43 INFO mapred.JobClient:  map 0% reduce 0%
>> 14/12/16 22:53:48 INFO mapred.JobClient: Task Id :
>> attempt_201412162229_0005_m_000001_1, Status : FAILED
>> Error: Java heap space
>> 14/12/16 22:54:00 INFO mapred.JobClient: Task Id :
>> attempt_201412162229_0005_m_000000_2, Status : FAILED
>> Error: Java heap space
>> 14/12/16 22:54:03 INFO mapred.JobClient: Task Id :
>> attempt_201412162229_0005_m_000001_2, Status : FAILED
>> Error: Java heap space
>> 14/12/16 22:54:21 INFO mapred.JobClient: Job complete:
>> job_201412162229_0005
>> 14/12/16 22:54:21 INFO mapred.JobClient: Counters: 7
>> 14/12/16 22:54:21 INFO mapred.JobClient:   Job Counters
>> 14/12/16 22:54:21 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=52527
>> 14/12/16 22:54:21 INFO mapred.JobClient:     Total time spent by all
>> reduces waiting after reserving slots (ms)=0
>> 14/12/16 22:54:21 INFO mapred.JobClient:     Total time spent by all maps
>> waiting after reserving slots (ms)=0
>> 14/12/16 22:54:21 INFO mapred.JobClient:     Launched map tasks=8
>> 14/12/16 22:54:21 INFO mapred.JobClient:     Data-local map tasks=8
>> 14/12/16 22:54:21 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>> 14/12/16 22:54:21 INFO mapred.JobClient:     Failed map tasks=1
>> Exception in thread "main" java.lang.IllegalStateException: Job failed!
>> at
>>
>> org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:95)
>> at
>>
>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:253)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>> at
>>
>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:55)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at
>>
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>
>> Through looking up into other threads, I'm giving 2048 MB to Mahout as
>> below.
>> [hadoop@localhost TEST]$ echo $MAHOUT_HEAPSIZE
>> 2048
>>
>> Not sure why, connection to Mahout job from JConsole will be
>> rejected,trying to check the heap status, however Mahout will dump heap
>> related information on console as below in synch with connection request
>> from JConsole.
>>
>>
>> "VM Thread" prio=10 tid=0x00007fbe1405d000 nid=0x3c01 runnable
>>
>> "VM Periodic Task Thread" prio=10 tid=0x00007fbe14094000 nid=0x3c08
>> waiting
>> on condition
>>
>> JNI global references: 1621
>>
>> Heap
>>  def new generation   total 18432K, used 16880K [0x00000000bc600000,
>> 0x00000000bd9f0000, 0x00000000d1350000)
>>   eden space 16448K,  98% used [0x00000000bc600000, 0x00000000bd5d7b28,
>> 0x00000000bd610000)
>>   from space 1984K,  33% used [0x00000000bd800000, 0x00000000bd8a47f8,
>> 0x00000000bd9f0000)
>>   to   space 1984K,   0% used [0x00000000bd610000, 0x00000000bd610000,
>> 0x00000000bd800000)
>>  tenured generation   total 40832K, used 464K [0x00000000d1350000,
>> 0x00000000d3b30000, 0x00000000fae00000)
>>    the space 40832K,   1% used [0x00000000d1350000, 0x00000000d13c40e8,
>> 0x00000000d13c4200, 0x00000000d3b30000)
>>  compacting perm gen  total 21248K, used 15218K [0x00000000fae00000,
>> 0x00000000fc2c0000, 0x0000000100000000)
>>    the space 21248K,  71% used [0x00000000fae00000, 0x00000000fbcdca18,
>> 0x00000000fbcdcc00, 0x00000000fc2c0000)
>> No shared spaces configured.
>>
>> 14/12/16 23:18:40 INFO mapred.JobClient: Task Id :
>> attempt_201412162229_0007_m_000000_1, Status : FAILED
>> Error: Java heap space
>> 14/12/16 23:18:41 INFO mapred.JobClient: Task Id :
>> attempt_201412162229_0007_m_000001_1, Status : FAILED
>> Error: Java heap space
>> 2014-12-16 23:18:41
>> Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.10-b01 mixed mode):
>>
>> In above, although it is only for a short period of time 98% of Eden Space
>> on heap seems to be consumed.(Guess that heap is really running out??)
>>
>> Please advise me in terms of what variable names in whatever in
>> Hadoop/Mahout should be increased (and may be how much.)
>>
>> Regards,,,
>> Y.Mandai
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message