spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lihu <lihu...@gmail.com>
Subject Re: the spark worker assignment Question?
Date Tue, 07 Jan 2014 15:15:44 GMT
Yeah, it will be better!


On Tue, Jan 7, 2014 at 1:04 AM, Andrew Ash <andrew@andrewash.com> wrote:

> Hi Li,
>
> I've also found this setting confusing in the past.  Take a look at this
> change -- do you think it makes the setting more clear?
>
> https://github.com/apache/incubator-spark/pull/341/files
>
> Andrew
>
>
> On Mon, Jan 6, 2014 at 8:19 AM, lihu <lihu723@gmail.com> wrote:
>
>> Sorry for my late reply, because the gmail do not notice me.
>>
>> It is my fault that cause this problem.
>> I take the config parameter* spark.core.max *as the maximum num in every
>> machine, but it is the total number in fact.
>>
>> and thank Andrew and Mayur very much, your answer let understand more
>> about the spark system.
>>
>>
>>
>> On Fri, Jan 3, 2014 at 2:28 AM, Mayur Rustagi <mayur.rustagi@gmail.com>wrote:
>>
>>> Andrew that a good point. I have done that for handling a large number
>>> of queries. Typically to get good response time on large number of queries
>>> in parallel, you would want them replicated on a lot of systems.
>>> Regards
>>> Mayur Rustagi
>>> Ph: +919632149971
>>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>> https://twitter.com/mayur_rustagi
>>>
>>>
>>>
>>> On Thu, Jan 2, 2014 at 11:22 PM, Andrew Ash <andrew@andrewash.com>wrote:
>>>
>>>> That sounds right Mayur.
>>>>
>>>> Also in 0.8.1 I hear there's a new repartition method that you might be
>>>> able to use to further distribute the data.  But if your data is so small
>>>> that it fits in just a couple blocks, why are you using 20 machines just
to
>>>> process a quarter GB of data?  Is the computation on each bit extremely
>>>> intensive?
>>>>
>>>>
>>>> On Thu, Jan 2, 2014 at 12:39 PM, Mayur Rustagi <mayur.rustagi@gmail.com
>>>> > wrote:
>>>>
>>>>> I have experienced a similar issue. The easiest fix I found was to
>>>>> increase the replication of the data being used in the worker to the
number
>>>>> of workers you want to use for processing. The RDD seem to created on
all
>>>>> the machines where the blocks are replicated. Please correct me if I
am
>>>>> wrong.
>>>>>
>>>>> Regards
>>>>> Mayur
>>>>>
>>>>> Mayur Rustagi
>>>>> Ph: +919632149971
>>>>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>>>> https://twitter.com/mayur_rustagi
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 2, 2014 at 10:46 PM, Andrew Ash <andrew@andrewash.com>wrote:
>>>>>
>>>>>> Hi lihu,
>>>>>>
>>>>>> Maybe the data you're accessing is in in HDFS and only resides on
4
>>>>>> of your 20 machines because it's only about 4 blocks (at default
64MB /
>>>>>> block that's around a quarter GB).  Where is your source data located
and
>>>>>> how is it stored?
>>>>>>
>>>>>> Andrew
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 2, 2014 at 7:53 AM, lihu <lihu723@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>    I run  spark on a cluster with 20 machine, but when I start
an
>>>>>>> application use the spark-shell, there only 4 machine is working
, the
>>>>>>> other with just idle, without memery and cpu used, I watch this
through
>>>>>>> webui.
>>>>>>>
>>>>>>>    I wonder the other machine maybe  busy, so i watch the machines
>>>>>>> using  "top" and "free" command, but this is not。
>>>>>>>
>>>>>>>   * So I just wonder why not spark assignment work to all all
the
>>>>>>> 20 machine? this is not a good resource usage.*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> *Best Wishes!*
>>
>> *Li Hu(李浒) | Graduate Student*
>>
>> *Institute for Interdisciplinary Information Sciences(IIIS
>> <http://iiis.tsinghua.edu.cn/>)*
>> *Tsinghua University, China*
>>
>> *Email: lihu723@gmail.com <lihu723@gmail.com>*
>> *Tel  : +86 15120081920 <%2B86%2015120081920>*
>> *Homepage: http://iiis.tsinghua.edu.cn/zh/lihu/
>> <http://iiis.tsinghua.edu.cn/zh/lihu/>*
>>
>>
>>
>


-- 
*Best Wishes!*

 *Li Hu(李浒) | Graduate Student*

*Institute for Interdisciplinary Information Sciences(IIIS
<http://iiis.tsinghua.edu.cn/>) *
*Tsinghua University, China*

*Email: lihu723@gmail.com <lihu723@gmail.com>*
*Tel  : +86 15120081920*
*Homepage: http://iiis.tsinghua.edu.cn/zh/lihu/
<http://iiis.tsinghua.edu.cn/zh/lihu/>*

Mime
View raw message