spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Issue of Hive parquet partitioned table schema mismatch
Date Fri, 30 Oct 2015 12:03:34 GMT
What Storage Format?



> On 30 Oct 2015, at 12:05, Rex Xiong <bychance@gmail.com> wrote:
> 
> Hi folks,
> 
> I have a Hive external table with partitions.
> Every day, an App will generate a new partition day=yyyy-MM-dd stored by parquet and
run add-partition Hive command.
> In some cases, we will add additional column to new partitions and update Hive table
schema, then a query across new and old partitions will fail with exception:
> 
> org.apache.hive.service.cli.HiveSQLException: org.apache.spark.sql.AnalysisException:
cannot resolve 'newcolumn' given input columns ....
> 
> We have tried schema merging feature, but it's too slow, there're hundreds of partitions.
> Is it possible to bypass this schema check and return a default value (such as null)
for missing columns?
> 
> Thank you

Mime
View raw message