spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yijie Shen <henry.yijies...@gmail.com>
Subject Spark SQL saveAsParquet failed after a few waves
Date Wed, 01 Apr 2015 02:17:38 GMT
Hi,

I am using spark-1.3 prebuilt release with hadoop2.4 support and Hadoop 2.4.0.

I wrote a spark application(LoadApp) to generate data in each task and load the data into
HDFS as parquet Files (use “saveAsParquet()” in spark sql)

When few waves (1 or 2) are used in a job, LoadApp could finish after a few failures and retries.
But when more waves (3) are involved in a job, the job would terminate abnormally.

All the failures I faced with is:
“java.io.IOException: The file being written is in an invalid state. Probably caused by
an error thrown previously. Current state: COLUMN"

and the stacktraces  are:

java.io.IOException: The file being written is in an invalid state. Probably caused by an
error thrown previously. Current state: COLUMN
	at parquet.hadoop.ParquetFileWriter$STATE.error(ParquetFileWriter.java:137)
	at parquet.hadoop.ParquetFileWriter$STATE.startBlock(ParquetFileWriter.java:129)
	at parquet.hadoop.ParquetFileWriter.startBlock(ParquetFileWriter.java:173)
	at parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:152)
	at parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112)
	at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73)
	at org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:634)
	at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:648)
	at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:648)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
	at org.apache.spark.scheduler.Task.run(Task.scala:64)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)


I have no idea what happened since jobs may fail or success without any reason.

Thanks.


Yijie Shen
Mime
View raw message