spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>
Subject Re: Hive Table not from from Spark SQL
Date Thu, 26 Mar 2015 11:02:07 GMT
I have tables dw_bid that is created in Hive and has nothing to do with
Spark.  I have data in avro that i want to join with dw_bid table, this
join needs to be done using Spark SQL.  However for some reason Spark says
dw_bid table does not exist. How do i say spark that dw_bid is a table
created in Hive and read it.


Query that is run from Spark SQL
==============================
 insert overwrite table sojsuccessevents2_spark select
guid,sessionKey,sessionStartDate,sojDataDate,seqNum,eventTimestamp,siteId,successEventType,sourceType,itemId,
shopCartId,b.transaction_Id as transactionId,offerId,b.bdr_id as
userId,priorPage1SeqNum,priorPage1PageId,exclWMSearchAttemptSeqNum,exclPriorSearchPageId,
exclPriorSearchSeqNum,exclPriorSearchCategory,exclPriorSearchL1,exclPriorSearchL2,currentImpressionId,sourceImpressionId,exclPriorSearchSqr,exclPriorSearchSort,
isDuplicate,b.bid_date as
transactionDate,auctionTypeCode,isBin,leafCategoryId,itemSiteId,b.qty_bid
as bidQuantity, b.bid_amt_unit_lstg_curncy * b.bid_exchng_rate as
 bidAmtUsd,offerQuantity,offerAmountUsd,offerCreateDate,buyerSegment,buyerCountryId,sellerId,sellerCountryId,
sellerStdLevel,cssSellerLevel,a.experimentChannel from sojsuccessevents1 a
join dw_bid b  on a.itemId = b.item_id  and  a.transactionId =
 b.transaction_id  where b.auct_end_dt >= '2015-02-16' AND b.bid_dt >=
'2015-02-16'  AND b.bid_type_code IN (1,9) AND b.bdr_id > 0 AND (
b.bid_flags & 32) = 0 and lower(a.successEventType) IN ('bid','bin')


If i create sojsuccessevents2_spark from hive command line and run above
command form Spark SQL program then i get error "sojsuccessevents2_spark
table not found".

Hence i dropped the command from Hive and run create table
sojsuccessevents2_spark from Spark SQL before running above command and it
works until it hits next road block "dw_bid table not found"

This makes me belive that Spark for some reason is not able to
read/understand the tables created outside Spark. I did copy
/apache/hive/conf/hive-site.xml into Spark conf directory.

Please suggest.


Logs
———
15/03/26 03:50:40 INFO HiveMetaStore.audit: ugi=dvasthimal
ip=unknown-ip-addr cmd=get_table : db=default tbl=dw_bid
15/03/26 03:50:40 ERROR metadata.Hive:
NoSuchObjectException(message:default.dw_bid table not found)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1560)



15/03/26 03:50:40 ERROR yarn.ApplicationMaster: User class threw exception:
no such table List(dw_bid); line 1 pos 843
org.apache.spark.sql.AnalysisException: no such table List(dw_bid); line 1
pos 843
at
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:178)
at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$6.applyOrElse(Analyzer.scala:187)



Regards,
Deepak


On Thu, Mar 26, 2015 at 4:27 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com> wrote:

> I have this query
>
>  insert overwrite table sojsuccessevents2_spark select
> guid,sessionKey,sessionStartDate,sojDataDate,seqNum,eventTimestamp,siteId,successEventType,sourceType,itemId,
> shopCartId,b.transaction_Id as transactionId,offerId,b.bdr_id as
> userId,priorPage1SeqNum,priorPage1PageId,exclWMSearchAttemptSeqNum,exclPriorSearchPageId,
> exclPriorSearchSeqNum,exclPriorSearchCategory,exclPriorSearchL1,exclPriorSearchL2,currentImpressionId,sourceImpressionId,exclPriorSearchSqr,exclPriorSearchSort,
> isDuplicate,b.bid_date as
> transactionDate,auctionTypeCode,isBin,leafCategoryId,itemSiteId,b.qty_bid
> as bidQuantity, b.bid_amt_unit_lstg_curncy * b.bid_exchng_rate as
>  bidAmtUsd,offerQuantity,offerAmountUsd,offerCreateDate,buyerSegment,buyerCountryId,sellerId,sellerCountryId,
> sellerStdLevel,cssSellerLevel,a.experimentChannel from sojsuccessevents1 a *join
> dw_bid b*  on a.itemId = b.item_id  and  a.transactionId =
>  b.transaction_id  where b.auct_end_dt >= '2015-02-16' AND b.bid_dt >=
> '2015-02-16'  AND b.bid_type_code IN (1,9) AND b.bdr_id > 0 AND (
> b.bid_flags & 32) = 0 and lower(a.successEventType) IN ('bid','bin')
>
>
> If i create sojsuccessevents2_spark from hive command line and run above
> command form Spark SQL program then i get error "sojsuccessevents2_spark
> table not found".
>
> Hence i dropped the command from Hive and run create table
> sojsuccessevents2_spark from Spark SQL before running above command and it
> works until it hits next road block "*dw_bid table not found"*
>
> This makes me belive that Spark for some reason is not able to
> read/understand the tables created outside Spark. I did copy
>   /apache/hive/conf/hive-site.xml into Spark conf directory.
>
> Please suggest.
>
> Regards,
> Deepak
>
>
> On Thu, Mar 26, 2015 at 1:26 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>
> wrote:
>
>> I have a hive table named dw_bid, when i run hive from command prompt and
>> run describe dw_bid, it works.
>>
>> I want to join a avro file (table) in HDFS with this hive dw_bid table
>> and i refer it as dw_bid from Spark SQL program, however i see
>>
>> 15/03/26 00:31:01 INFO HiveMetaStore.audit: ugi=dvasthimal
>> ip=unknown-ip-addr cmd=get_table : db=default tbl=dw_bid
>> 15/03/26 00:31:01 ERROR metadata.Hive:
>> NoSuchObjectException(message:default.dw_bid table not found)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1560)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>
>>
>> Code:
>>
>>     val successDetail_S1 = sqlContext.avroFile(input)
>>     successDetail_S1.registerTempTable("sojsuccessevents1")
>>     val countS1 = sqlContext.sql("select
>> guid,sessionKey,sessionStartDate,sojDataDate,seqNum,eventTimestamp,siteId,successEventType,sourceType,itemId,"
>> +
>>         " shopCartId,b.transaction_Id as transactionId,offerId,b.bdr_id
>> as
>> userId,priorPage1SeqNum,priorPage1PageId,exclWMSearchAttemptSeqNum,exclPriorSearchPageId,"
>> +
>>         "
>> exclPriorSearchSeqNum,exclPriorSearchCategory,exclPriorSearchL1,exclPriorSearchL2,currentImpressionId,sourceImpressionId,exclPriorSearchSqr,exclPriorSearchSort,"
>> +
>>         " isDuplicate,b.bid_date as
>> transactionDate,auctionTypeCode,isBin,leafCategoryId,itemSiteId,b.qty_bid
>> as bidQuantity," +
>>     " b.bid_amt_unit_lstg_curncy * b.bid_exchng_rate as
>>  bidAmtUsd,offerQuantity,offerAmountUsd,offerCreateDate,buyerSegment,buyerCountryId,sellerId,sellerCountryId,"
>> +
>>     " sellerStdLevel,cssSellerLevel,a.experimentChannel" +
>>     " from sojsuccessevents1 a join dw_bid b " +
>>     " on a.itemId = b.item_id  and  a.transactionId =  b.transaction_id
>> " +
>>     " where b.bid_type_code IN (1,9) AND b.bdr_id > 0 AND ( b.bid_flags
>> & 32) = 0 and lower(a.successEventType) IN ('bid','bin')")
>>     println("countS1.first:" + countS1.first)
>>
>>
>>
>> Any suggestions on how to refer a hive table form Spark SQL?
>> --
>>
>> Deepak
>>
>>
>
>
> --
> Deepak
>
>


-- 
Deepak

Mime
View raw message