spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: ArrayIndexOutOfBoundsException when reading bzip2 files
Date Mon, 09 Jun 2014 12:30:30 GMT
Hi Akhil,
Plz find the code below.
 x = sc.textFile("hdfs:///******")
 x = x.filter(lambda z:z.split(",")[0]!=' ')
 x = x.filter(lambda z:z.split(",")[3]!=' ')
 z = x.reduce(add)
Thanks & Regards, 
Meethu M

On Monday, 9 June 2014 5:52 PM, Akhil Das <> wrote:

Can you paste the piece of code!?

Best Regards

On Mon, Jun 9, 2014 at 5:24 PM, MEETHU MATHEW <> wrote:

>I am getting ArrayIndexOutOfBoundsException while reading from bz2 files  in HDFS.I have
come across the same issue in JIRA at, but
it seems to be resolved. I have tried the workaround suggested(SPARK_WORKER_CORES=1),but
its still showing error.What may be the possible reason that I am getting the same error again?
>I am using Spark1.0.0 with hadoop 1.2.1.
>java.lang.ArrayIndexOutOfBoundsException: 900000
>at org.apache.hadoop.util.LineReader.readDefaultLine(
>at org.apache.hadoop.util.LineReader.readLine(
>at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:198)
>at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:181)
>at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
>at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:303)
>at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:200)
>at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:175)
>at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:175)
>at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
>at org.apache.spark.api.python.PythonRDD$
>Thanks & Regards, 
>Meethu M
View raw message