spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheng, Hao" <hao.ch...@intel.com>
Subject RE: [SparkSQL] Reuse HiveContext to different Hive warehouse?
Date Wed, 11 Mar 2015 00:24:54 GMT
I am not so sure if Hive supports change the metastore after initialized, I guess not. Spark
SQL totally rely on Hive Metastore in HiveContext, probably that's why it doesn't work as
expected for Q1.

BTW, in most of cases, people configure the metastore settings in hive-site.xml, and will
not change that since then, is there any reason that you want to change that in runtime?

For Q2, probably something wrong in configuration, seems the HDFS run into the pseudo/single
node mode, can you double check that? Or can you run the DDL (like create a table) from the
spark shell with HiveContext?

From: Haopu Wang [mailto:HWang@qilinsoft.com]
Sent: Tuesday, March 10, 2015 6:38 PM
To: user; dev@spark.apache.org
Subject: [SparkSQL] Reuse HiveContext to different Hive warehouse?


I'm using Spark 1.3.0 RC3 build with Hive support.



In Spark Shell, I want to reuse the HiveContext instance to different warehouse locations.
Below are the steps for my test (Assume I have loaded a file into table "src").



======

15/03/10 18:22:59 INFO SparkILoop: Created sql context (with Hive support)..

SQL context available as sqlContext.

scala> sqlContext.sql("SET hive.metastore.warehouse.dir=/test/w")

scala> sqlContext.sql("SELECT * from src").saveAsTable("table1")

scala> sqlContext.sql("SET hive.metastore.warehouse.dir=/test/w2")

scala> sqlContext.sql("SELECT * from src").saveAsTable("table2")

======

After these steps, the tables are stored in "/test/w" only. I expect "table2" to be stored
in "/test/w2" folder.



Another question is: if I set "hive.metastore.warehouse.dir" to a HDFS folder, I cannot use
saveAsTable()? Is this by design? Exception stack trace is below:

======

15/03/10 18:35:28 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0

15/03/10 18:35:28 INFO SparkContext: Created broadcast 0 from broadcast at TableReader.scala:74

java.lang.IllegalArgumentException: Wrong FS: hdfs://server:8020/space/warehouse/table2, expected:
file:///<file:///\\>

        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643)

        at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:463)

        at org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:118)

        at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:252)

        at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:251)

        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

        at scala.collection.immutable.List.foreach(List.scala:318)

        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

        at scala.collection.AbstractTraversable.map(Traversable.scala:105)

        at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:251)

        at org.apache.spark.sql.parquet.ParquetRelation2.<init>(newParquet.scala:370)

        at org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:96)

        at org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:125)

        at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:308)

        at org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:217)

        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:55)

        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:55)

        at org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:65)

        at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1088)

        at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1088)

        at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1048)

        at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:998)

        at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:964)

        at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:942)

        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:20)

        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25)

        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)

        at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)

        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:31)

        at $iwC$$iwC$$iwC.<init>(<console>:33)

        at $iwC$$iwC.<init>(<console>:35)

        at $iwC.<init>(<console>:37)

        at <init>(<console>:39)



Thank you very much!



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message