spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: Caching parquet table (with GZIP) on Spark 1.3.1
Date Sun, 07 Jun 2015 14:25:54 GMT
Is it possible that some Parquet files of this data set have different 
schema as others? Especially those ones reported in the exception messages.

One way to confirm this is to use [parquet-tools] [1] to inspect these 
files:

     $ parquet-schema <path-to-file>

Cheng

[1]: https://github.com/apache/parquet-mr/tree/master/parquet-tools

On 5/26/15 3:26 PM, shshann@tsmc.com wrote:
>
> we tried to cache table through
> hiveCtx = HiveContext(sc)
> hiveCtx.cacheTable("table name")
> as described on Spark 1.3.1's document and we're on CDH5.3.0 with Spark
> 1.3.1 built with Hadoop 2.6
> following error message would occur if we tried to cache table with parquet
> format & GZIP
> though we're not sure if this error message has anything to do with the
> table format since we can execute SQLs on the exact same table,
> we just hope to use cachTable so that it might speed-up a little bit since
> we're querying on this table for several times.
> Any advise is welcomed! Thanks!
>
> 15/05/26 15:21:32 WARN scheduler.TaskSetManager: Lost task 227.0 in stage
> 0.0 (TID 278, f14ecats037): parquet.io.ParquetDecodingException: Can not
> read value at 0 in block -1 in file
> hdfs://f14ecat/tmp/tchart_0501_final/part-r-1198.parquet
>          at parquet.hadoop.InternalParquetRecordReader.nextKeyValue
> (InternalParquetRecordReader.java:213)
>          at parquet.hadoop.ParquetRecordReader.nextKeyValue
> (ParquetRecordReader.java:204)
>          at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext
> (NewHadoopRDD.scala:143)
>          at org.apache.spark.InterruptibleIterator.hasNext
> (InterruptibleIterator.scala:39)
>          at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>          at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon
> $1.hasNext(InMemoryColumnarTableScan.scala:153)
>          at org.apache.spark.storage.MemoryStore.unrollSafely
> (MemoryStore.scala:248)
>          at org.apache.spark.CacheManager.putInBlockManager
> (CacheManager.scala:172)
>          at org.apache.spark.CacheManager.getOrCompute
> (CacheManager.scala:79)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.scheduler.ShuffleMapTask.runTask
> (ShuffleMapTask.scala:68)
>          at org.apache.spark.scheduler.ShuffleMapTask.runTask
> (ShuffleMapTask.scala:41)
>          at org.apache.spark.scheduler.Task.run(Task.scala:64)
>          at org.apache.spark.executor.Executor$TaskRunner.run
> (Executor.scala:203)
>          at java.util.concurrent.ThreadPoolExecutor.runWorker
> (ThreadPoolExecutor.java:1142)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run
> (ThreadPoolExecutor.java:617)
>          at java.lang.Thread.run(Thread.java:745)
> Caused by: parquet.io.ParquetDecodingException: The requested schema is not
> compatible with the file schema. incompatible types: optional binary
> dcqv_val (UTF8) != optional double dcqv_val
>          at parquet.io.ColumnIOFactory
> $ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105)
>          at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit
> (ColumnIOFactory.java:97)
>          at parquet.schema.PrimitiveType.accept(PrimitiveType.java:386)
>          at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren
> (ColumnIOFactory.java:87)
>          at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit
> (ColumnIOFactory.java:61)
>          at parquet.schema.MessageType.accept(MessageType.java:55)
>          at parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148)
>          at parquet.hadoop.InternalParquetRecordReader.checkRead
> (InternalParquetRecordReader.java:125)
>          at parquet.hadoop.InternalParquetRecordReader.nextKeyValue
> (InternalParquetRecordReader.java:193)
>          ... 31 more
>
> 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Starting task 74.2 in
> stage 0.0 (TID 377, f14ecats025, NODE_LOCAL, 2153 bytes)
> 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 56.2 in stage
> 0.0 (TID 329) on executor f14ecats025: parquet.io.ParquetDecodingException
> (Can not read value at 0 in block -1 in file
> hdfs://f14ecat/tmp/tchart_0501_final/part-r-1047.parquet) [duplicate 2]
> 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Starting task 165.1 in
> stage 0.0 (TID 378, f14ecats026, NODE_LOCAL, 2151 bytes)
> 15/05/26 15:21:32 WARN scheduler.TaskSetManager: Lost task 145.0 in stage
> 0.0 (TID 133, f14ecats026): parquet.io.ParquetDecodingException: Can not
> read value at 0 in block -1 in file
> hdfs://f14ecat/tmp/tchart_0501_final/part-r-1123.parquet
>          at parquet.hadoop.InternalParquetRecordReader.nextKeyValue
> (InternalParquetRecordReader.java:213)
>          at parquet.hadoop.ParquetRecordReader.nextKeyValue
> (ParquetRecordReader.java:204)
>          at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext
> (NewHadoopRDD.scala:143)
>          at org.apache.spark.InterruptibleIterator.hasNext
> (InterruptibleIterator.scala:39)
>          at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>          at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon
> $1.hasNext(InMemoryColumnarTableScan.scala:153)
>          at org.apache.spark.storage.MemoryStore.unrollSafely
> (MemoryStore.scala:248)
>          at org.apache.spark.CacheManager.putInBlockManager
> (CacheManager.scala:172)
>          at org.apache.spark.CacheManager.getOrCompute
> (CacheManager.scala:79)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.scheduler.ShuffleMapTask.runTask
> (ShuffleMapTask.scala:68)
>          at org.apache.spark.scheduler.ShuffleMapTask.runTask
> (ShuffleMapTask.scala:41)
>          at org.apache.spark.scheduler.Task.run(Task.scala:64)
>          at org.apache.spark.executor.Executor$TaskRunner.run
> (Executor.scala:203)
>          at java.util.concurrent.ThreadPoolExecutor.runWorker
> (ThreadPoolExecutor.java:1142)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run
> (ThreadPoolExecutor.java:617)
>          at java.lang.Thread.run(Thread.java:745)
> Caused by: parquet.io.ParquetDecodingException: The requested schema is not
> compatible with the file schema. incompatible types: optional binary
> dcqv_val (UTF8) != optional double dcqv_val
>          at parquet.io.ColumnIOFactory
> $ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105)
>          at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit
> (ColumnIOFactory.java:97)
>          at parquet.schema.PrimitiveType.accept(PrimitiveType.java:386)
>          at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren
> (ColumnIOFactory.java:87)
>          at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit
> (ColumnIOFactory.java:61)
>          at parquet.schema.MessageType.accept(MessageType.java:55)
>          at parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148)
>          at parquet.hadoop.InternalParquetRecordReader.checkRead
> (InternalParquetRecordReader.java:125)
>          at parquet.hadoop.InternalParquetRecordReader.nextKeyValue
> (InternalParquetRecordReader.java:193)
>          ... 31 more
>
> 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Starting task 221.2 in
> stage 0.0 (TID 379, f14ecats035, NODE_LOCAL, 2154 bytes)
> 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 90.3 in stage
> 0.0 (TID 323) on executor f14ecats035: parquet.io.ParquetDecodingException
> (Can not read value at 0 in block -1 in file
> hdfs://f14ecat/tmp/tchart_0501_final/part-r-1076.parquet) [duplicate 3]
> 15/05/26 15:21:32 ERROR scheduler.TaskSetManager: Task 90 in stage 0.0
> failed 4 times; aborting job
> 15/05/26 15:21:32 WARN scheduler.TaskSetManager: Lost task 52.0 in stage
> 0.0 (TID 48, f14ecats009): parquet.io.ParquetDecodingException: Can not
> read value at 0 in block -1 in file
> hdfs://f14ecat/tmp/tchart_0501_final/part-r-1043.parquet
>          at parquet.hadoop.InternalParquetRecordReader.nextKeyValue
> (InternalParquetRecordReader.java:213)
>          at parquet.hadoop.ParquetRecordReader.nextKeyValue
> (ParquetRecordReader.java:204)
>          at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext
> (NewHadoopRDD.scala:143)
>          at org.apache.spark.InterruptibleIterator.hasNext
> (InterruptibleIterator.scala:39)
>          at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>          at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon
> $1.hasNext(InMemoryColumnarTableScan.scala:153)
>          at org.apache.spark.storage.MemoryStore.unrollSafely
> (MemoryStore.scala:248)
>          at org.apache.spark.CacheManager.putInBlockManager
> (CacheManager.scala:172)
>          at org.apache.spark.CacheManager.getOrCompute
> (CacheManager.scala:79)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.scheduler.ShuffleMapTask.runTask
> (ShuffleMapTask.scala:68)
>          at org.apache.spark.scheduler.ShuffleMapTask.runTask
> (ShuffleMapTask.scala:41)
>          at org.apache.spark.scheduler.Task.run(Task.scala:64)
>          at org.apache.spark.executor.Executor$TaskRunner.run
> (Executor.scala:203)
>          at java.util.concurrent.ThreadPoolExecutor.runWorker
> (ThreadPoolExecutor.java:1142)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run
> (ThreadPoolExecutor.java:617)
>          at java.lang.Thread.run(Thread.java:745)
> Caused by: parquet.io.ParquetDecodingException: The requested schema is not
> compatible with the file schema. incompatible types: optional binary
> dcqv_val (UTF8) != optional double dcqv_val
>          at parquet.io.ColumnIOFactory
> $ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105)
>          at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit
> (ColumnIOFactory.java:97)
>          at parquet.schema.PrimitiveType.accept(PrimitiveType.java:386)
>          at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren
> (ColumnIOFactory.java:87)
>          at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit
> (ColumnIOFactory.java:61)
>          at parquet.schema.MessageType.accept(MessageType.java:55)
>          at parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148)
>          at parquet.hadoop.InternalParquetRecordReader.checkRead
> (InternalParquetRecordReader.java:125)
>          at parquet.hadoop.InternalParquetRecordReader.nextKeyValue
> (InternalParquetRecordReader.java:193)
>          ... 31 more
>
> 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 179.1 in stage
> 0.0 (TID 269) on executor f14ecats031: parquet.io.ParquetDecodingException
> (Can not read value at 0 in block -1 in file
> hdfs://f14ecat/tmp/tchart_0501_final/part-r-1154.parquet) [duplicate 1]
> 15/05/26 15:21:32 WARN scheduler.TaskSetManager: Lost task 98.0 in stage
> 0.0 (TID 45, f14ecats008): parquet.io.ParquetDecodingException: Can not
> read value at 0 in block -1 in file
> hdfs://f14ecat/tmp/tchart_0501_final/part-r-1083.parquet
>          at parquet.hadoop.InternalParquetRecordReader.nextKeyValue
> (InternalParquetRecordReader.java:213)
>          at parquet.hadoop.ParquetRecordReader.nextKeyValue
> (ParquetRecordReader.java:204)
>          at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext
> (NewHadoopRDD.scala:143)
>          at org.apache.spark.InterruptibleIterator.hasNext
> (InterruptibleIterator.scala:39)
>          at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>          at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon
> $1.hasNext(InMemoryColumnarTableScan.scala:153)
>          at org.apache.spark.storage.MemoryStore.unrollSafely
> (MemoryStore.scala:248)
>          at org.apache.spark.CacheManager.putInBlockManager
> (CacheManager.scala:172)
>          at org.apache.spark.CacheManager.getOrCompute
> (CacheManager.scala:79)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.scheduler.ShuffleMapTask.runTask
> (ShuffleMapTask.scala:68)
>          at org.apache.spark.scheduler.ShuffleMapTask.runTask
> (ShuffleMapTask.scala:41)
>          at org.apache.spark.scheduler.Task.run(Task.scala:64)
>          at org.apache.spark.executor.Executor$TaskRunner.run
> (Executor.scala:203)
>          at java.util.concurrent.ThreadPoolExecutor.runWorker
> (ThreadPoolExecutor.java:1142)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run
> (ThreadPoolExecutor.java:617)
>          at java.lang.Thread.run(Thread.java:745)
> Caused by: parquet.io.ParquetDecodingException: The requested schema is not
> compatible with the file schema. incompatible types: optional binary
> dcqv_val (UTF8) != optional double dcqv_val
>          at parquet.io.ColumnIOFactory
> $ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105)
>          at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit
> (ColumnIOFactory.java:97)
>          at parquet.schema.PrimitiveType.accept(PrimitiveType.java:386)
>          at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren
> (ColumnIOFactory.java:87)
>          at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit
> (ColumnIOFactory.java:61)
>          at parquet.schema.MessageType.accept(MessageType.java:55)
>          at parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148)
>          at parquet.hadoop.InternalParquetRecordReader.checkRead
> (InternalParquetRecordReader.java:125)
>          at parquet.hadoop.InternalParquetRecordReader.nextKeyValue
> (InternalParquetRecordReader.java:193)
>          ... 31 more
>
> 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 134.1 in stage
> 0.0 (TID 317) on executor f14ecats007: parquet.io.ParquetDecodingException
> (Can not read value at 0 in block -1 in file
> hdfs://f14ecat/tmp/tchart_0501_final/part-r-1113.parquet) [duplicate 1]
> 15/05/26 15:21:32 INFO cluster.YarnScheduler: Cancelling stage 0
> 15/05/26 15:21:32 INFO cluster.YarnScheduler: Stage 0 was cancelled
> 15/05/26 15:21:32 WARN scheduler.TaskSetManager: Lost task 239.0 in stage
> 0.0 (TID 273, f14ecats036): parquet.io.ParquetDecodingException: Can not
> read value at 0 in block -1 in file
> hdfs://f14ecat/tmp/tchart_0501_final/part-r-1208.parquet
>          at parquet.hadoop.InternalParquetRecordReader.nextKeyValue
> (InternalParquetRecordReader.java:213)
>          at parquet.hadoop.ParquetRecordReader.nextKeyValue
> (ParquetRecordReader.java:204)
>          at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext
> (NewHadoopRDD.scala:143)
>          at org.apache.spark.InterruptibleIterator.hasNext
> (InterruptibleIterator.scala:39)
>          at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>          at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon
> $1.hasNext(InMemoryColumnarTableScan.scala:153)
>          at org.apache.spark.storage.MemoryStore.unrollSafely
> (MemoryStore.scala:248)
>          at org.apache.spark.CacheManager.putInBlockManager
> (CacheManager.scala:172)
>          at org.apache.spark.CacheManager.getOrCompute
> (CacheManager.scala:79)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.rdd.MapPartitionsRDD.compute
> (MapPartitionsRDD.scala:35)
>          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>          at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>          at org.apache.spark.scheduler.ShuffleMapTask.runTask
> (ShuffleMapTask.scala:68)
>          at org.apache.spark.scheduler.ShuffleMapTask.runTask
> (ShuffleMapTask.scala:41)
>          at org.apache.spark.scheduler.Task.run(Task.scala:64)
>          at org.apache.spark.executor.Executor$TaskRunner.run
> (Executor.scala:203)
>          at java.util.concurrent.ThreadPoolExecutor.runWorker
> (ThreadPoolExecutor.java:1142)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run
> (ThreadPoolExecutor.java:617)
>          at java.lang.Thread.run(Thread.java:745)
> Caused by: parquet.io.ParquetDecodingException: The requested schema is not
> compatible with the file schema. incompatible types: optional binary
> dcqv_val (UTF8) != optional double dcqv_val
>          at parquet.io.ColumnIOFactory
> $ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105)
>          at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit
> (ColumnIOFactory.java:97)
>          at parquet.schema.PrimitiveType.accept(PrimitiveType.java:386)
>          at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren
> (ColumnIOFactory.java:87)
>          at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit
> (ColumnIOFactory.java:61)
>          at parquet.schema.MessageType.accept(MessageType.java:55)
>          at parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148)
>          at parquet.hadoop.InternalParquetRecordReader.checkRead
> (InternalParquetRecordReader.java:125)
>          at parquet.hadoop.InternalParquetRecordReader.nextKeyValue
> (InternalParquetRecordReader.java:193)
>          ... 31 more
>
> 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 36.1 in stage
> 0.0 (TID 328) on executor f14ecats036: parquet.io.ParquetDecodingException
> (Can not read value at 0 in block -1 in file
> hdfs://f14ecat/tmp/tchart_0501_final/part-r-1029.parquet) [duplicate 1]
> 15/05/26 15:21:32 INFO scheduler.DAGScheduler: Stage 0 (mapPartitions at
> Exchange.scala:65) failed in 3.189 s
> 15/05/26 15:21:32 INFO scheduler.DAGScheduler: Job 0 failed: collect
> at /home/bdadm/SparkSQLTchart-1.3.py:19, took 4.255423 s
> 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 25.1 in stage
> 0.0 (TID 288) on executor f14ecats037: parquet.io.ParquetDecodingException
> (Can not read value at 0 in block -1 in file
> hdfs://f14ecat/tmp/tchart_0501_final/part-r-102.parquet) [duplicate 1]
> 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 211.1 in stage
> 0.0 (TID 281) on executor f14ecats037: parquet.io.ParquetDecodingException
> (Can not read value at 0 in block -1 in file
> hdfs://f14ecat/tmp/tchart_0501_final/part-r-1183.parquet) [duplicate 1]
> 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 190.1 in stage
> 0.0 (TID 309) on executor f14ecats019: parquet.io.ParquetDecodingException
> (Can not read value at 0 in block -1 in file
> hdfs://f14ecat/tmp/tchart_0501_final/part-r-1164.parquet) [duplicate 1]
> 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 95.1 in stage
> 0.0 (TID 270) on executor f14ecats037: parquet.io.ParquetDecodingException
> (Can not read value at 0 in block -1 in file
> hdfs://f14ecat/tmp/tchart_0501_final/part-r-1080.parquet) [duplicate 1]
> 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 86.1 in stage
> 0.0 (TID 280) on executor f14ecats026: parquet.io.ParquetDecodingException
> (Can not read value at 0 in block -1 in file
> hdfs://f14ecat/tmp/tchart_0501_final/part-r-1072.parquet) [duplicate 1]
>   ---------------------------------------------------------------------------
>                                                           TSMC PROPERTY
>   This email communication (and any attachments) is proprietary information
>   for the sole use of its
>   intended recipient. Any unauthorized review, use or distribution by anyone
>   other than the intended
>   recipient is strictly prohibited.  If you are not the intended recipient,
>   please notify the sender by
>   replying to this email, and then delete this email and any copies of it
>   immediately. Thank you.
>   ---------------------------------------------------------------------------
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message