spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin East <robin.e...@xense.co.uk>
Subject Re: Saving data using tempTable versus save() method
Date Tue, 21 Jun 2016 09:03:19 GMT
if you are able to trace the underlying oracle session you can see whether a commit has been
called or not.




> On 21 Jun 2016, at 09:57, Robin East <robin.east@xense.co.uk> wrote:
> 
> I’m not sure - I don’t know what those APIs do under the hood. It simply rang a bell
with something I have fallen foul of in the past (not with Spark though) - have wasted many
hours forgetting to commit and then scratching my head as why my data is not persisting.
> 
> 
> 
> 
>> On 21 Jun 2016, at 09:20, Mich Talebzadeh <mich.talebzadeh@gmail.com <mailto:mich.talebzadeh@gmail.com>>
wrote:
>> 
>> that is a very interesting point. I am not sure. how can I do that with
>> 
>> sorted.save("oraclehadoop.sales2")
>> 
>> like .. commit?
>> 
>> thanks
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> 
>> On 21 June 2016 at 08:56, Robin East <robin.east@xense.co.uk <mailto:robin.east@xense.co.uk>>
wrote:
>> random thought - do you need an explicit commit with the 2nd method?
>> 
>> 
>> 
>> 
>>> On 20 Jun 2016, at 21:35, Mich Talebzadeh <mich.talebzadeh@gmail.com <mailto:mich.talebzadeh@gmail.com>>
wrote:
>>> 
>>> Hi,
>>> 
>>> I have a DF based on a table and sorted and shown below
>>> 
>>> This is fine and when I register as tempTable I can populate the underlying table
sales 2 in Hive. That sales2 is an ORC table 
>>> 
>>>  val s = HiveContext.table("sales_staging")
>>>   val sorted = s.sort("prod_id","cust_id","time_id","channel_id","promo_id")
>>>   sorted.registerTempTable("tmp")
>>>   sqltext = """
>>>   INSERT INTO TABLE oraclehadoop.sales2
>>>   SELECT
>>>           PROD_ID
>>>         , CUST_ID
>>>         , TIME_ID
>>>         , CHANNEL_ID
>>>         , PROMO_ID
>>>         , QUANTITY_SOLD
>>>         , AMOUNT_SOLD
>>>   FROM tmp
>>>   """
>>>   HiveContext.sql(sqltext)
>>>   HiveContext.sql("select count(1) from oraclehadoop.sales2").show
>>>   HiveContext.sql("truncate table oraclehadoop.sales2")
>>> 
>>>   sorted.save("oraclehadoop.sales2")
>>>   HiveContext.sql("select count(1) from oraclehadoop.sales2").show
>>> 
>>> When I truncate the Hive table and use sorted.save("oraclehadoop.sales2")
>>> 
>>> It does not save any data
>>> 
>>> Started at
>>> [20/06/2016 21:21:57.57]
>>> +------+
>>> |   _c0|
>>> +------+
>>> |918843|    // This works
>>> +------+
>>> [Stage 7:============================================>              (3 + 1)
/ 4]SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
>>> SLF4J: Defaulting to no-operation (NOP) logger implementation
>>> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder <http://www.slf4j.org/codes.html#StaticLoggerBinder>
for further details.
>>> +---+
>>> |_c0|
>>> +---+
>>> |  0|      // This does not
>>> +---+
>>> Finished at
>>> [20/06/2016 21:22:30.30]
>>> 
>>> Any ideas if anyone has seen this before?
>>> 
>>> 
>>> The issue is saving data. Saving through tempTable works but the other one does
not work.
>>> 
>>> 
>>> Thanks
>>> 
>>> Dr Mich Talebzadeh
>>>  
>>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>>  
>>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>>  
>> 
>> 
> 


Mime
View raw message