spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lin, Hao" <Hao....@finra.org>
Subject try to read multiple bz2 files in s3
Date Mon, 01 Feb 2016 22:35:59 GMT
When I tried to read multiple bz2 files from s3, I have the following warning messages. What
is the problem here?

16/02/01 22:30:30 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 10.162.67.248):
java.lang.ArrayIndexOutOfBoundsException: -1844424343
        at org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.getAndMoveToFrontDecode0(CBZip2InputStream.java:1014)
        at org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.getAndMoveToFrontDecode(CBZip2InputStream.java:829)
        at org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock(CBZip2InputStream.java:504)
        at org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.changeStateToProcessABlock(CBZip2InputStream.java:333)
        at org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.read(CBZip2InputStream.java:399)
        at org.apache.hadoop.io.compress.BZip2Codec$BZip2CompressionInputStream.read(BZip2Codec.java:483)
        at java.io.InputStream.read(InputStream.java:101)
        at org.apache.hadoop.mapreduce.lib.input.CompressedSplitLineReader.fillBuffer(CompressedSplitLineReader.java:130)
        at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
        at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
        at org.apache.hadoop.mapreduce.lib.input.CompressedSplitLineReader.readLine(CompressedSplitLineReader.java:159)
        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:209)
        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:248)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:216)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1553)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1125)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1125)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Confidentiality Notice::  This email, including attachments, may include non-public, proprietary,
confidential or legally privileged information.  If you are not an intended recipient or an
authorized agent of an intended recipient, you are hereby notified that any dissemination,
distribution or copying of the information contained in or transmitted with this e-mail is
unauthorized and strictly prohibited.  If you have received this email in error, please notify
the sender by replying to this message and permanently delete this e-mail, its attachments,
and any copies of it immediately.  You should not retain, copy or use this e-mail or any attachment
for any purpose, nor disclose all or any part of the contents to any other person. Thank you.
Mime
View raw message