spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dipankar (JIRA)" <>
Subject [jira] [Created] (SPARK-15682) Hive ORC partition write looks for root hdfs folder for existence
Date Tue, 31 May 2016 21:12:12 GMT
Dipankar created SPARK-15682:

             Summary: Hive ORC partition write looks for root hdfs folder for existence
                 Key: SPARK-15682
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.6.1
            Reporter: Dipankar

I am using the below program to create new partition based on the current date which signifies
the run date.

However, it fails citing hdfs folder already exists. It checks the root folder and not new
partition value.

Is partitionBy clause actually not checking the hive metastore or folder till proc_date= some
value. ? and it's just a way to create folders based on partition key. Not any way related
to hive partition ??

Alternatively, should i use
result.write.format("orc").save("test.sms_outbound_view_orc/proc_date=2016-05-30") to achieve
the result.

But this will not update hive metastore with new partition details.
Is spark orc support not equivalent to HCatStorer API?

My hive table is built with proc_date as partition column. 

Source code :
val result_partition = sqlContext.sql("FROM result_tab select *,'"+curr_date+"' as proc_date")

16/05/31 15:57:34 INFO ParseDriver: Parsing command: FROM result_tab select *,'2016-05-31'
as proc_date
16/05/31 15:57:34 INFO ParseDriver: Parse Completed
Exception in thread "main" org.apache.spark.sql.AnalysisException: path hdfs://hdpprod/user/dipankar.ghosal/test.sms_outbound_view_orc
already exists.;
	at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
	at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
	at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138)
	at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933)
	at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933)
	at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:197)
	at SampleApp$.main(SampleApp.scala:31)

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message