spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Isabelle Phan <nlip...@gmail.com>
Subject Re: SparkSQL API to insert DataFrame into a static partition?
Date Fri, 04 Dec 2015 22:09:04 GMT
Thanks all for your reply!

I tested both approaches: registering the temp table then executing SQL vs.
saving to HDFS filepath directly. The problem with the second approach is
that I am inserting data into a Hive table, so if I create a new partition
with this method, Hive metadata is not updated.

So I will be going with first approach.
Follow up question in this case: what is the cost of registering a temp
table? Is there a limit to the number of temp tables that can be registered
by Spark context?


Thanks again for your input.

Isabelle



On Wed, Dec 2, 2015 at 10:30 AM, Michael Armbrust <michael@databricks.com>
wrote:

> you might also coalesce to 1 (or some small number) before writing to
> avoid creating a lot of files in that partition if you know that there is
> not a ton of data.
>
> On Wed, Dec 2, 2015 at 12:59 AM, Rishi Mishra <rmishra@snappydata.io>
> wrote:
>
>> As long as all your data is being inserted by Spark , hence using the
>> same hash partitioner,  what Fengdong mentioned should work.
>>
>> On Wed, Dec 2, 2015 at 9:32 AM, Fengdong Yu <fengdongy@everstring.com>
>> wrote:
>>
>>> Hi
>>> you can try:
>>>
>>> if your table under location “/test/table/“ on HDFS
>>> and has partitions:
>>>
>>>  “/test/table/dt=2012”
>>>  “/test/table/dt=2013”
>>>
>>> df.write.mode(SaveMode.Append).partitionBy("date”).save(“/test/table")
>>>
>>>
>>>
>>> On Dec 2, 2015, at 10:50 AM, Isabelle Phan <nliphan@gmail.com> wrote:
>>>
>>> df.write.partitionBy("date").insertInto("my_table")
>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Rishitesh Mishra,
>> SnappyData . (http://www.snappydata.io/)
>>
>> https://in.linkedin.com/in/rishiteshmishra
>>
>
>

Mime
View raw message