spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mayur Rustagi <mayur.rust...@gmail.com>
Subject Re: the spark worker assignment Question?
Date Thu, 02 Jan 2014 17:39:08 GMT
I have experienced a similar issue. The easiest fix I found was to increase
the replication of the data being used in the worker to the number of
workers you want to use for processing. The RDD seem to created on all the
machines where the blocks are replicated. Please correct me if I am wrong.

Regards
Mayur

Mayur Rustagi
Ph: +919632149971
h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi



On Thu, Jan 2, 2014 at 10:46 PM, Andrew Ash <andrew@andrewash.com> wrote:

> Hi lihu,
>
> Maybe the data you're accessing is in in HDFS and only resides on 4 of
> your 20 machines because it's only about 4 blocks (at default 64MB / block
> that's around a quarter GB).  Where is your source data located and how is
> it stored?
>
> Andrew
>
>
> On Thu, Jan 2, 2014 at 7:53 AM, lihu <lihu723@gmail.com> wrote:
>
>> Hi,
>>    I run  spark on a cluster with 20 machine, but when I start an
>> application use the spark-shell, there only 4 machine is working , the
>> other with just idle, without memery and cpu used, I watch this through
>> webui.
>>
>>    I wonder the other machine maybe  busy, so i watch the machines using
>>  "top" and "free" command, but this is not。
>>
>>   * So I just wonder why not spark assignment work to all all the 20
>> machine? this is not a good resource usage.*
>>
>>
>>
>>
>>
>

Mime
View raw message