spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From swetha kasireddy <swethakasire...@gmail.com>
Subject Re: How to insert data for 100 partitions at a time using Spark SQL
Date Sun, 22 May 2016 19:38:28 GMT
Around 14000 partitions need to be loaded every hour. Yes, I tested this
and its taking a lot of time to load. A partition would look something like
the following which is further partitioned by userId with all the
userRecords for that date inside it.

5 2016-05-20 16:03 /user/user/userRecords/dtPartitioner=2012-09-12

On Sun, May 22, 2016 at 12:30 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> wrote:

> by partition do you mean 14000 files loaded in each batch session (say
> daily)?.
>
> Have you actually tested this?
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 22 May 2016 at 20:24, swetha kasireddy <swethakasireddy@gmail.com>
> wrote:
>
>> The data is not very big. Say 1MB-10 MB at the max per partition. What is
>> the best way to insert this 14k partitions with decent performance?
>>
>> On Sun, May 22, 2016 at 12:18 PM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> the acid question is how many rows are you going to insert in a batch
>>> session? btw if this is purely an sql operation then you can do all that in
>>> hive running on spark engine. It will be very fast as well.
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 22 May 2016 at 20:14, Jörn Franke <jornfranke@gmail.com> wrote:
>>>
>>>> 14000 partitions seem to be way too many to be performant (except for
>>>> large data sets). How much data does one partition contain?
>>>>
>>>> > On 22 May 2016, at 09:34, SRK <swethakasireddy@gmail.com> wrote:
>>>> >
>>>> > Hi,
>>>> >
>>>> > In my Spark SQL query to insert data, I have around 14,000 partitions
>>>> of
>>>> > data which seems to be causing memory issues. How can I insert the
>>>> data for
>>>> > 100 partitions at a time to avoid any memory issues?
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-insert-data-for-100-partitions-at-a-time-using-Spark-SQL-tp26997.html
>>>> > Sent from the Apache Spark User List mailing list archive at
>>>> Nabble.com.
>>>> >
>>>> > ---------------------------------------------------------------------
>>>> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> > For additional commands, e-mail: user-help@spark.apache.org
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>
>

Mime
View raw message