mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alok Tanna <tannaa...@gmail.com>
Subject Mahout error : seq2sparse
Date Thu, 04 Feb 2016 03:33:00 GMT
Mahout in local mode

I am able to successfully run the below command on smaller data set, but
then when I am running this command on large data set I am getting below
error.  Its looks like I need to increase size of some parameter but then I
am not sure which one.  It is failing with this error java.io.EOFException
  which creating the dictionary-0 file

Please fine the attached file for more details.

command: mahout seq2sparse -i /home/ubuntu/AT/AT-Seq/ -o
/home/ubuntu/AT/AT-vectors/ -lnorm -nv -wt tfidf

Main error :


16/02/03 23:02:06 INFO mapred.LocalJobRunner: reduce > reduce
16/02/03 23:02:17 INFO mapred.LocalJobRunner: reduce > reduce
16/02/03 23:02:18 WARN mapred.LocalJobRunner: job_local1308764206_0003
java.io.EOFException
        at java.io.DataInputStream.readByte(DataInputStream.java:267)
        at
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299)
        at
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320)
        at org.apache.hadoop.io.Text.readFields(Text.java:263)
        at
org.apache.mahout.common.StringTuple.readFields(StringTuple.java:142)
        at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
        at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
        at
org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
        at
org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
        at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
16/02/03 23:02:18 INFO mapred.JobClient: Job complete:
job_local1308764206_0003
16/02/03 23:02:18 INFO mapred.JobClient: Counters: 20
16/02/03 23:02:18 INFO mapred.JobClient:   File Output Format Counters
16/02/03 23:02:18 INFO mapred.JobClient:     Bytes Written=14923244
16/02/03 23:02:18 INFO mapred.JobClient:   FileSystemCounters
16/02/03 23:02:18 INFO mapred.JobClient:     FILE_BYTES_READ=1412144036729
16/02/03 23:02:18 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=323876626568
16/02/03 23:02:18 INFO mapred.JobClient:   File Input Format Counters
16/02/03 23:02:18 INFO mapred.JobClient:     Bytes Read=11885543289
16/02/03 23:02:18 INFO mapred.JobClient:   Map-Reduce Framework
16/02/03 23:02:18 INFO mapred.JobClient:     Reduce input groups=223
16/02/03 23:02:18 INFO mapred.JobClient:     Map output materialized
bytes=2214020551
16/02/03 23:02:18 INFO mapred.JobClient:     Combine output records=0
16/02/03 23:02:18 INFO mapred.JobClient:     Map input records=223
16/02/03 23:02:18 INFO mapred.JobClient:     Reduce shuffle bytes=0
16/02/03 23:02:18 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=0
16/02/03 23:02:18 INFO mapred.JobClient:     Reduce output records=222
16/02/03 23:02:18 INFO mapred.JobClient:     Spilled Records=638
16/02/03 23:02:18 INFO mapred.JobClient:     Map output bytes=2214019100
16/02/03 23:02:18 INFO mapred.JobClient:     CPU time spent (ms)=0
16/02/03 23:02:18 INFO mapred.JobClient:     Total committed heap usaAT
(bytes)=735978192896
16/02/03 23:02:18 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=0
16/02/03 23:02:18 INFO mapred.JobClient:     Combine input records=0
16/02/03 23:02:18 INFO mapred.JobClient:     Map output records=223
16/02/03 23:02:18 INFO mapred.JobClient:     SPLIT_RAW_BYTES=9100
16/02/03 23:02:18 INFO mapred.JobClient:     Reduce input records=222
Exception in thread "main" java.lang.IllegalStateException: Job failed!
        at
org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329)
        at
org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199)
        at
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:274)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:56)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
.
.



-- 
Thanks & Regards,

Alok R. Tanna

Mime
View raw message