spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tamas Jambor <jambo...@gmail.com>
Subject Re: store hive metastore on persistent store
Date Sat, 16 May 2015 19:10:32 GMT
ah, that explains it, many thanks!

On Sat, May 16, 2015 at 7:41 PM, Yana Kadiyska <yana.kadiyska@gmail.com>
wrote:

> oh...metastore_db location is not controlled by
> hive.metastore.warehouse.dir -- one is the location of your metastore DB,
> the other is the physical location of your stored data. Checkout this SO
> thread:
> http://stackoverflow.com/questions/13624893/metastore-db-created-wherever-i-run-hive
>
>
> On Sat, May 16, 2015 at 9:07 AM, Tamas Jambor <jamborta@gmail.com> wrote:
>
>> Gave it another try - it seems that it picks up the variable and prints
>> out the correct value, but still puts the metatore_db folder in the current
>> directory, regardless.
>>
>> On Sat, May 16, 2015 at 1:13 PM, Tamas Jambor <jamborta@gmail.com> wrote:
>>
>>> Thank you for the reply.
>>>
>>> I have tried your experiment, it seems that it does not print the
>>> settings out in spark-shell (I'm using 1.3 by the way).
>>>
>>> Strangely I have been experimenting with an SQL connection instead,
>>> which works after all (still if I go to spark-shell and try to print out
>>> the SQL settings that I put in hive-site.xml, it does not print them).
>>>
>>>
>>> On Fri, May 15, 2015 at 7:22 PM, Yana Kadiyska <yana.kadiyska@gmail.com>
>>> wrote:
>>>
>>>> My point was more to how to verify that properties are picked up from
>>>> the hive-site.xml file. You don't really need hive.metastore.uris if
>>>> you're not running against an external metastore.  I just did an
>>>> experiment with warehouse.dir.
>>>>
>>>> My hive-site.xml looks like this:
>>>>
>>>> <configuration>
>>>>     <property>
>>>>         <name>hive.metastore.warehouse.dir</name>
>>>>         <value>/home/ykadiysk/Github/warehouse_dir</value>
>>>>         <description>location of default database for the warehouse</description>
>>>>     </property>
>>>> </configuration>
>>>>
>>>> ​
>>>>
>>>> and spark-shell code:
>>>>
>>>> scala> val hc= new org.apache.spark.sql.hive.HiveContext(sc)
>>>> hc: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@3036c16f
>>>>
>>>> scala> hc.sql("show tables").collect
>>>> 15/05/15 14:12:57 INFO HiveMetaStore: 0: Opening raw store with implemenation
class:org.apache.hadoop.hive.metastore.ObjectStore
>>>> 15/05/15 14:12:57 INFO ObjectStore: ObjectStore, initialize called
>>>> 15/05/15 14:12:57 INFO Persistence: Property datanucleus.cache.level2 unknown
- will be ignored
>>>> 15/05/15 14:12:58 WARN Connection: BoneCP specified but not present in CLASSPATH
(or one of dependencies)
>>>> 15/05/15 14:12:58 WARN Connection: BoneCP specified but not present in CLASSPATH
(or one of dependencies)
>>>> 15/05/15 14:13:03 INFO ObjectStore: Setting MetaStore object pin classes
with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
>>>> 15/05/15 14:13:03 INFO ObjectStore: Initialized ObjectStore
>>>> 15/05/15 14:13:04 WARN ObjectStore: Version information not found in metastore.
hive.metastore.schema.verification is not enabled so recording the schema version 0.12.0-protobuf-2.5
>>>> 15/05/15 14:13:05 INFO HiveMetaStore: 0: get_tables: db=default pat=.*
>>>> 15/05/15 14:13:05 INFO audit: ugi=ykadiysk      ip=unknown-ip-addr      cmd=get_tables:
db=default pat=.*
>>>> 15/05/15 14:13:05 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema"
is tagged as "embedded-only" so does not have its own datastore table.
>>>> 15/05/15 14:13:05 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder"
is tagged as "embedded-only" so does not have its own datastore table.
>>>> res0: Array[org.apache.spark.sql.Row] = Array()
>>>>
>>>> scala> hc.getConf("hive.metastore.warehouse.dir")
>>>> res1: String = /home/ykadiysk/Github/warehouse_dir
>>>>
>>>> ​
>>>>
>>>> I have not tried an HDFS path but you should be at least able to verify
>>>> that the variable is being read. It might be that your value is read but
is
>>>> otherwise not liked...
>>>>
>>>> On Fri, May 15, 2015 at 2:03 PM, Tamas Jambor <jamborta@gmail.com>
>>>> wrote:
>>>>
>>>>> thanks for the reply. I am trying to use it without hive setup
>>>>> (spark-standalone), so it prints something like this:
>>>>>
>>>>> hive_ctx.sql("show tables").collect()
>>>>> 15/05/15 17:59:03 INFO HiveMetaStore: 0: Opening raw store with
>>>>> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
>>>>> 15/05/15 17:59:03 INFO ObjectStore: ObjectStore, initialize called
>>>>> 15/05/15 17:59:04 INFO Persistence: Property datanucleus.cache.level2
>>>>> unknown - will be ignored
>>>>> 15/05/15 17:59:04 INFO Persistence: Property
>>>>> hive.metastore.integral.jdo.pushdown unknown - will be ignored
>>>>> 15/05/15 17:59:04 WARN Connection: BoneCP specified but not present in
>>>>> CLASSPATH (or one of dependencies)
>>>>> 15/05/15 17:59:05 WARN Connection: BoneCP specified but not present in
>>>>> CLASSPATH (or one of dependencies)
>>>>> 15/05/15 17:59:08 INFO BlockManagerMasterActor: Registering block
>>>>> manager xxxx:42819 with 3.0 GB RAM, BlockManagerId(2, xxx, 42819)
>>>>>
>>>>>       [0/1844]
>>>>> 15/05/15 17:59:18 INFO ObjectStore: Setting MetaStore object pin
>>>>> classes with
>>>>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
>>>>> 15/05/15 17:59:18 INFO MetaStoreDirectSql: MySQL check failed,
>>>>> assuming we are not on mysql: Lexical error at line 1, column 5.
>>>>> Encountered: "@" (64), after : "".
>>>>> 15/05/15 17:59:20 INFO Datastore: The class
>>>>> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
>>>>> "embedded-only" so does not have its own datastore table.
>>>>> 15/05/15 17:59:20 INFO Datastore: The class
>>>>> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as
>>>>> "embedded-only" so does not have its own datastore table.
>>>>> 15/05/15 17:59:28 INFO Datastore: The class
>>>>> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as
>>>>> "embedded-only" so does not have its own datastore table.
>>>>> 15/05/15 17:59:29 INFO Datastore: The class
>>>>> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as
>>>>> "embedded-only" so does not have its own datastore table.
>>>>> 15/05/15 17:59:31 INFO ObjectStore: Initialized ObjectStore
>>>>> 15/05/15 17:59:32 WARN ObjectStore: Version information not found in
>>>>> metastore. hive.metastore.schema.verification is not enabled so recording
>>>>> the schema version 0.13.1aa
>>>>> 15/05/15 17:59:33 WARN MetricsConfig: Cannot locate configuration:
>>>>> tried
>>>>> hadoop-metrics2-azure-file-system.properties,hadoop-metrics2.properties
>>>>> 15/05/15 17:59:33 INFO MetricsSystemImpl: Scheduled snapshot period at
>>>>> 10 second(s).
>>>>> 15/05/15 17:59:33 INFO MetricsSystemImpl: azure-file-system metrics
>>>>> system started
>>>>> 15/05/15 17:59:33 INFO HiveMetaStore: Added admin role in metastore
>>>>> 15/05/15 17:59:34 INFO HiveMetaStore: Added public role in metastore
>>>>> 15/05/15 17:59:34 INFO HiveMetaStore: No user is added in admin role,
>>>>> since config is empty
>>>>> 15/05/15 17:59:35 INFO SessionState: No Tez session required at this
>>>>> point. hive.execution.engine=mr.
>>>>> 15/05/15 17:59:37 INFO HiveMetaStore: 0: get_tables: db=default pat=.*
>>>>> 15/05/15 17:59:37 INFO audit: ugi=testuser     ip=unknown-ip-addr
>>>>>  cmd=get_tables: db=default pat=.*
>>>>>
>>>>> not sure what to put in hive.metastore.uris in this case?
>>>>>
>>>>>
>>>>> On Fri, May 15, 2015 at 2:52 PM, Yana Kadiyska <
>>>>> yana.kadiyska@gmail.com> wrote:
>>>>>
>>>>>> This should work. Which version of Spark are you using? Here is what
>>>>>> I do -- make sure hive-site.xml is in the conf directory of the machine
>>>>>> you're using the driver from. Now let's run spark-shell from that
machine:
>>>>>>
>>>>>> scala> val hc= new org.apache.spark.sql.hive.HiveContext(sc)
>>>>>> hc: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@6e9f8f26
>>>>>>
>>>>>> scala> hc.sql("show tables").collect
>>>>>> 15/05/15 09:34:17 INFO metastore: Trying to connect to metastore
with URI thrift://hostname.com:9083              <-- here should be a value from your hive-site.xml
>>>>>> 15/05/15 09:34:17 INFO metastore: Waiting 1 seconds before next connection
attempt.
>>>>>> 15/05/15 09:34:18 INFO metastore: Connected to metastore.
>>>>>> res0: Array[org.apache.spark.sql.Row] = Array([table1,false],
>>>>>>
>>>>>> scala> hc.getConf("hive.metastore.uris")
>>>>>> res13: String = thrift://hostname.com:9083
>>>>>>
>>>>>> scala> hc.getConf("hive.metastore.warehouse.dir")
>>>>>> res14: String = /user/hive/warehouse
>>>>>>
>>>>>> ​
>>>>>>
>>>>>> The first line tells you which metastore it's trying to connect to
--
>>>>>> this should be the string specified under hive.metastore.uris property
in
>>>>>> your hive-site.xml file. I have not mucked with warehouse.dir too
much but
>>>>>> I know that the value of the metastore URI is in fact picked up from
there
>>>>>> as I regularly point to different systems...
>>>>>>
>>>>>>
>>>>>> On Thu, May 14, 2015 at 6:26 PM, Tamas Jambor <jamborta@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I have tried to put the hive-site.xml file in the conf/ directory
>>>>>>> with, seems it is not picking up from there.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, May 14, 2015 at 6:50 PM, Michael Armbrust <
>>>>>>> michael@databricks.com> wrote:
>>>>>>>
>>>>>>>> You can configure Spark SQLs hive interaction by placing
a
>>>>>>>> hive-site.xml file in the conf/ directory.
>>>>>>>>
>>>>>>>> On Thu, May 14, 2015 at 10:24 AM, jamborta <jamborta@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> is it possible to set hive.metastore.warehouse.dir, that
is
>>>>>>>>> internally
>>>>>>>>> create by spark, to be stored externally (e.g. s3 on
aws or wasb
>>>>>>>>> on azure)?
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> View this message in context:
>>>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/store-hive-metastore-on-persistent-store-tp22891.html
>>>>>>>>> Sent from the Apache Spark User List mailing list archive
at
>>>>>>>>> Nabble.com.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message