spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: Error in using saveAsParquetFile
Date Mon, 08 Jun 2015 11:09:55 GMT
Are you appending the joined DataFrame whose PolicyType is string to an 
existing Parquet file whose PolicyType is int? The exception indicates 
that Parquet found a column with conflicting data types.

Cheng

On 6/8/15 5:29 PM, bipin wrote:
> Hi I get this error message when saving a table:
>
> parquet.io.ParquetDecodingException: The requested schema is not compatible
> with the file schema. incompatible types: optional binary PolicyType (UTF8)
> != optional int32 PolicyType
> 	at
> parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105)
> 	at
> parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:97)
> 	at parquet.schema.PrimitiveType.accept(PrimitiveType.java:386)
> 	at
> parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:87)
> 	at
> parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:61)
> 	at parquet.schema.MessageType.accept(MessageType.java:55)
> 	at parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148)
> 	at parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:137)
> 	at parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:157)
> 	at
> parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:107)
> 	at
> parquet.hadoop.InternalParquetRecordWriter.<init>(InternalParquetRecordWriter.java:94)
> 	at parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:64)
> 	at
> parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282)
> 	at
> parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252)
> 	at
> org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:667)
> 	at
> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
> 	at
> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
> 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:64)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> 	at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
>
> I joined two tables both loaded from parquet file, the joined table when
> saved throws this error. I could not find anything about this error. Could
> this be a bug ?
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-in-using-saveAsParquetFile-tp23204.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message