mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Musselman <andrew.mussel...@gmail.com>
Subject Re: Advise needed for Mahout heap size allocation (seq2sparse failure)
Date Wed, 17 Dec 2014 18:55:54 GMT
It's worth trying to increase the heap size for child JVMs per this doc,
depending on what version you're running:
http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/ClusterSetup.html

On Tue, Dec 16, 2014 at 11:33 PM, 万代豊 <20525entradero@gmail.com> wrote:
>
> Hi
> After my several successful jobs experiences on other Mahout Kmeans
> calculation in the past , I'm facing a sudden heap error as below in Mahout
> seq2sparse process.(Mahout-0.70 on Hadoop-0.20.203 Pseudo-distributed)
>
> [hadoop@localhost TEST]$ $MAHOUT_HOME/bin/mahout seq2sparse --namedVector
> -i TEST/TEST-seqfile/ -o TEST/TEST-namedVector -ow -a
> org.apache.lucene.analysis.WhitespaceAnalyzer -chunk 200 -wt tfidf -s 5 -md
> 3 -x 90 -ml 50 -seq -n 2
> Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
> MAHOUT-JOB: /usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar
> 14/12/16 22:52:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum
> n-gram size is: 1
> 14/12/16 22:52:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum
> LLR value: 50.0
> 14/12/16 22:52:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of
> reduce tasks: 1
> 14/12/16 22:52:57 INFO input.FileInputFormat: Total input paths to process
> : 10
> 14/12/16 22:52:57 INFO mapred.JobClient: Running job: job_201412162229_0005
> 14/12/16 22:52:58 INFO mapred.JobClient:  map 0% reduce 0%
> 14/12/16 22:53:27 INFO mapred.JobClient: Task Id :
> attempt_201412162229_0005_m_000000_0, Status : FAILED
> Error: Java heap space
> 14/12/16 22:53:29 INFO mapred.JobClient: Task Id :
> attempt_201412162229_0005_m_000001_0, Status : FAILED
> Error: Java heap space
> 14/12/16 22:53:40 INFO mapred.JobClient:  map 2% reduce 0%
> 14/12/16 22:53:42 INFO mapred.JobClient: Task Id :
> attempt_201412162229_0005_m_000000_1, Status : FAILED
> Error: Java heap space
> attempt_201412162229_0005_m_000000_1: log4j:WARN No appenders could be
> found for logger (org.apache.hadoop.mapred.Task).
> attempt_201412162229_0005_m_000000_1: log4j:WARN Please initialize the
> log4j system properly.
> 14/12/16 22:53:43 INFO mapred.JobClient:  map 0% reduce 0%
> 14/12/16 22:53:48 INFO mapred.JobClient: Task Id :
> attempt_201412162229_0005_m_000001_1, Status : FAILED
> Error: Java heap space
> 14/12/16 22:54:00 INFO mapred.JobClient: Task Id :
> attempt_201412162229_0005_m_000000_2, Status : FAILED
> Error: Java heap space
> 14/12/16 22:54:03 INFO mapred.JobClient: Task Id :
> attempt_201412162229_0005_m_000001_2, Status : FAILED
> Error: Java heap space
> 14/12/16 22:54:21 INFO mapred.JobClient: Job complete:
> job_201412162229_0005
> 14/12/16 22:54:21 INFO mapred.JobClient: Counters: 7
> 14/12/16 22:54:21 INFO mapred.JobClient:   Job Counters
> 14/12/16 22:54:21 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=52527
> 14/12/16 22:54:21 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 14/12/16 22:54:21 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 14/12/16 22:54:21 INFO mapred.JobClient:     Launched map tasks=8
> 14/12/16 22:54:21 INFO mapred.JobClient:     Data-local map tasks=8
> 14/12/16 22:54:21 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
> 14/12/16 22:54:21 INFO mapred.JobClient:     Failed map tasks=1
> Exception in thread "main" java.lang.IllegalStateException: Job failed!
> at
>
> org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:95)
> at
>
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:253)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at
>
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:55)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> Through looking up into other threads, I'm giving 2048 MB to Mahout as
> below.
> [hadoop@localhost TEST]$ echo $MAHOUT_HEAPSIZE
> 2048
>
> Not sure why, connection to Mahout job from JConsole will be
> rejected,trying to check the heap status, however Mahout will dump heap
> related information on console as below in synch with connection request
> from JConsole.
>
>
> "VM Thread" prio=10 tid=0x00007fbe1405d000 nid=0x3c01 runnable
>
> "VM Periodic Task Thread" prio=10 tid=0x00007fbe14094000 nid=0x3c08 waiting
> on condition
>
> JNI global references: 1621
>
> Heap
>  def new generation   total 18432K, used 16880K [0x00000000bc600000,
> 0x00000000bd9f0000, 0x00000000d1350000)
>   eden space 16448K,  98% used [0x00000000bc600000, 0x00000000bd5d7b28,
> 0x00000000bd610000)
>   from space 1984K,  33% used [0x00000000bd800000, 0x00000000bd8a47f8,
> 0x00000000bd9f0000)
>   to   space 1984K,   0% used [0x00000000bd610000, 0x00000000bd610000,
> 0x00000000bd800000)
>  tenured generation   total 40832K, used 464K [0x00000000d1350000,
> 0x00000000d3b30000, 0x00000000fae00000)
>    the space 40832K,   1% used [0x00000000d1350000, 0x00000000d13c40e8,
> 0x00000000d13c4200, 0x00000000d3b30000)
>  compacting perm gen  total 21248K, used 15218K [0x00000000fae00000,
> 0x00000000fc2c0000, 0x0000000100000000)
>    the space 21248K,  71% used [0x00000000fae00000, 0x00000000fbcdca18,
> 0x00000000fbcdcc00, 0x00000000fc2c0000)
> No shared spaces configured.
>
> 14/12/16 23:18:40 INFO mapred.JobClient: Task Id :
> attempt_201412162229_0007_m_000000_1, Status : FAILED
> Error: Java heap space
> 14/12/16 23:18:41 INFO mapred.JobClient: Task Id :
> attempt_201412162229_0007_m_000001_1, Status : FAILED
> Error: Java heap space
> 2014-12-16 23:18:41
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.10-b01 mixed mode):
>
> In above, although it is only for a short period of time 98% of Eden Space
> on heap seems to be consumed.(Guess that heap is really running out??)
>
> Please advise me in terms of what variable names in whatever in
> Hadoop/Mahout should be increased (and may be how much.)
>
> Regards,,,
> Y.Mandai
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message