ah, that explains it, many thanks!

On Sat, May 16, 2015 at 7:41 PM, Yana Kadiyska <yana.kadiyska@gmail.com> wrote:
oh...metastore_db location is not controlled by hive.metastore.warehouse.dir -- one is the location of your metastore DB, the other is the physical location of your stored data. Checkout this SO thread: http://stackoverflow.com/questions/13624893/metastore-db-created-wherever-i-run-hive


On Sat, May 16, 2015 at 9:07 AM, Tamas Jambor <jamborta@gmail.com> wrote:
Gave it another try - it seems that it picks up the variable and prints out the correct value, but still puts the metatore_db folder in the current directory, regardless.

On Sat, May 16, 2015 at 1:13 PM, Tamas Jambor <jamborta@gmail.com> wrote:
Thank you for the reply.

I have tried your experiment, it seems that it does not print the settings out in spark-shell (I'm using 1.3 by the way).

Strangely I have been experimenting with an SQL connection instead, which works after all (still if I go to spark-shell and try to print out the SQL settings that I put in hive-site.xml, it does not print them).


On Fri, May 15, 2015 at 7:22 PM, Yana Kadiyska <yana.kadiyska@gmail.com> wrote:
My point was more to how to verify that properties are picked up from the hive-site.xml file. You don't really need hive.metastore.uris if you're not running against an external metastore.  I just did an experiment with warehouse.dir.

My hive-site.xml looks like this:

<configuration>
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/home/ykadiysk/Github/warehouse_dir</value>
        <description>location of default database for the warehouse</description>
    </property>
</configuration>

and spark-shell code:

scala> val hc= new org.apache.spark.sql.hive.HiveContext(sc)
hc: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@3036c16f

scala> hc.sql("show tables").collect
15/05/15 14:12:57 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
15/05/15 14:12:57 INFO ObjectStore: ObjectStore, initialize called
15/05/15 14:12:57 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
15/05/15 14:12:58 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/05/15 14:12:58 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/05/15 14:13:03 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
15/05/15 14:13:03 INFO ObjectStore: Initialized ObjectStore
15/05/15 14:13:04 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.12.0-protobuf-2.5
15/05/15 14:13:05 INFO HiveMetaStore: 0: get_tables: db=default pat=.*
15/05/15 14:13:05 INFO audit: ugi=ykadiysk      ip=unknown-ip-addr      cmd=get_tables: db=default pat=.*
15/05/15 14:13:05 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/05/15 14:13:05 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
res0: Array[org.apache.spark.sql.Row] = Array()

scala> hc.getConf("hive.metastore.warehouse.dir")
res1: String = /home/ykadiysk/Github/warehouse_dir

I have not tried an HDFS path but you should be at least able to verify that the variable is being read. It might be that your value is read but is otherwise not liked...

On Fri, May 15, 2015 at 2:03 PM, Tamas Jambor <jamborta@gmail.com> wrote:
thanks for the reply. I am trying to use it without hive setup (spark-standalone), so it prints something like this:

hive_ctx.sql("show tables").collect()
15/05/15 17:59:03 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
15/05/15 17:59:03 INFO ObjectStore: ObjectStore, initialize called
15/05/15 17:59:04 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
15/05/15 17:59:04 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
15/05/15 17:59:04 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/05/15 17:59:05 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/05/15 17:59:08 INFO BlockManagerMasterActor: Registering block manager xxxx:42819 with 3.0 GB RAM, BlockManagerId(2, xxx, 42819)                                                                                             [0/1844]
15/05/15 17:59:18 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
15/05/15 17:59:18 INFO MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5.  Encountered: "@" (64), after : "".
15/05/15 17:59:20 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/05/15 17:59:20 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/05/15 17:59:28 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/05/15 17:59:29 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/05/15 17:59:31 INFO ObjectStore: Initialized ObjectStore
15/05/15 17:59:32 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.1aa
15/05/15 17:59:33 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-azure-file-system.properties,hadoop-metrics2.properties
15/05/15 17:59:33 INFO MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
15/05/15 17:59:33 INFO MetricsSystemImpl: azure-file-system metrics system started
15/05/15 17:59:33 INFO HiveMetaStore: Added admin role in metastore
15/05/15 17:59:34 INFO HiveMetaStore: Added public role in metastore
15/05/15 17:59:34 INFO HiveMetaStore: No user is added in admin role, since config is empty
15/05/15 17:59:35 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr.
15/05/15 17:59:37 INFO HiveMetaStore: 0: get_tables: db=default pat=.*
15/05/15 17:59:37 INFO audit: ugi=testuser     ip=unknown-ip-addr      cmd=get_tables: db=default pat=.*

not sure what to put in hive.metastore.uris in this case?


On Fri, May 15, 2015 at 2:52 PM, Yana Kadiyska <yana.kadiyska@gmail.com> wrote:
This should work. Which version of Spark are you using? Here is what I do -- make sure hive-site.xml is in the conf directory of the machine you're using the driver from. Now let's run spark-shell from that machine:

scala> val hc= new org.apache.spark.sql.hive.HiveContext(sc)
hc: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@6e9f8f26

scala> hc.sql("show tables").collect
15/05/15 09:34:17 INFO metastore: Trying to connect to metastore with URI thrift://hostname.com:9083              <-- here should be a value from your hive-site.xml
15/05/15 09:34:17 INFO metastore: Waiting 1 seconds before next connection attempt.
15/05/15 09:34:18 INFO metastore: Connected to metastore.
res0: Array[org.apache.spark.sql.Row] = Array([table1,false],

scala> hc.getConf("hive.metastore.uris")
res13: String = thrift://hostname.com:9083

scala> hc.getConf("hive.metastore.warehouse.dir")
res14: String = /user/hive/warehouse

The first line tells you which metastore it's trying to connect to -- this should be the string specified under hive.metastore.uris property in your hive-site.xml file. I have not mucked with warehouse.dir too much but I know that the value of the metastore URI is in fact picked up from there as I regularly point to different systems...


On Thu, May 14, 2015 at 6:26 PM, Tamas Jambor <jamborta@gmail.com> wrote:
I have tried to put the hive-site.xml file in the conf/ directory with, seems it is not picking up from there.


On Thu, May 14, 2015 at 6:50 PM, Michael Armbrust <michael@databricks.com> wrote:
You can configure Spark SQLs hive interaction by placing a hive-site.xml file in the conf/ directory.

On Thu, May 14, 2015 at 10:24 AM, jamborta <jamborta@gmail.com> wrote:
Hi all,

is it possible to set hive.metastore.warehouse.dir, that is internally
create by spark, to be stored externally (e.g. s3 on aws or wasb on azure)?

thanks,



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/store-hive-metastore-on-persistent-store-tp22891.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org