spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From swetha kasireddy <swethakasire...@gmail.com>
Subject Re: How to insert data for 100 partitions at a time using Spark SQL
Date Sun, 22 May 2016 18:11:12 GMT
I am looking at ORC. I insert the data using the following query.

sqlContext.sql("  CREATE EXTERNAL TABLE IF NOT EXISTS records (id STRING,
record STRING) PARTITIONED BY (datePartition STRING, idPartition STRING)
stored as ORC LOCATION '/user/users' ")
      sqlContext.sql("  orc.compress= SNAPPY")
      sqlContext.sql(
        """ from recordsTemp ps   insert overwrite table users
partition(datePartition , idPartition )  select ps.id, ps.record ,
ps.datePartition, ps.idPartition  """.stripMargin)

On Sun, May 22, 2016 at 12:37 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> wrote:

> where is your base table and what format is it Parquet, ORC etc)
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 22 May 2016 at 08:34, SRK <swethakasireddy@gmail.com> wrote:
>
>> Hi,
>>
>> In my Spark SQL query to insert data, I have around 14,000 partitions of
>> data which seems to be causing memory issues. How can I insert the data
>> for
>> 100 partitions at a time to avoid any memory issues?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-insert-data-for-100-partitions-at-a-time-using-Spark-SQL-tp26997.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Mime
View raw message