spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ilya Ganelin <ilgan...@gmail.com>
Subject Re: How can number of partitions be set in "spark-env.sh"?
Date Tue, 28 Oct 2014 17:00:47 GMT
In Spark, certain functions have an optional parameter to determine the
number of partitions (distinct, textFile, etc..). You can also use the
coalesce () or repartiton() functions to change the number of partitions
for your RDD. Thanks.
On Oct 28, 2014 9:58 AM, "shahab" <shahab.mokari@gmail.com> wrote:

> Thanks for the useful comment. But I guess this setting applies only when
> I use SparkSQL  right=  is there any similar settings for Spark?
>
> best,
> /Shahab
>
> On Tue, Oct 28, 2014 at 2:38 PM, Wanda Hawk <wanda_hawk89@yahoo.com>
> wrote:
>
>> Is this what are you looking for ?
>>
>> In Shark, default reducer number is 1 and is controlled by the property
>> mapred.reduce.tasks. Spark SQL deprecates this property in favor of
>> spark.sql.shuffle.partitions, whose default value is 200. Users may
>> customize this property via SET:
>>
>> SET spark.sql.shuffle.partitions=10;
>> SELECT page, count(*) c
>> FROM logs_last_month_cached
>> GROUP BY page ORDER BY c DESC LIMIT 10;
>>
>>
>> Spark SQL Programming Guide - Spark 1.1.0 Documentation
>> <http://spark.apache.org/docs/latest/sql-programming-guide.html>
>>
>>
>>
>>
>>
>>
>> Spark SQL Programming Guide - Spark 1.1.0 Documentation
>> <http://spark.apache.org/docs/latest/sql-programming-guide.html>
>> Spark SQL Programming Guide Overview Getting Started Data Sources RDDs
>> Inferring the Schema Using Reflection Programmatically Specifying the
>> Schema Parquet Files Loading Data Programmatically
>> View on spark.apache.org
>> <http://spark.apache.org/docs/latest/sql-programming-guide.html>
>> Preview by Yahoo
>>
>>
>>   ------------------------------
>>  *From:* shahab <shahab.mokari@gmail.com>
>> *To:* user@spark.apache.org
>> *Sent:* Tuesday, October 28, 2014 3:20 PM
>> *Subject:* How can number of partitions be set in "spark-env.sh"?
>>
>> I am running a stand alone Spark cluster, 2 workers each has 2 cores.
>> Apparently, I am loading and processing relatively large chunk of data so
>> that I receive task failure " " .  As I read from some posts and
>> discussions in the mailing list the failures could be related to the large
>> size of processing data in the partitions and if I have understood
>> correctly I should have smaller partitions (but many of them) ?!
>>
>> Is there any way that I can set the number of partitions dynamically in
>> "spark-env.sh" or in the submiited Spark application?
>>
>>
>> best,
>> /Shahab
>>
>>
>>
>

Mime
View raw message