spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <t...@databricks.com>
Subject Re: Problem in Understanding concept of Physical Cores
Date Wed, 08 Jul 2015 23:13:58 GMT
There are several levels of indirection going on here, let me clarify.

In the local mode, Spark runs tasks (which includes receivers) using the
number of threads defined in the master (either local, or local[2], or
local[*]).
local or local[1] = single thread, so only one task at a time
local[2] = 2 threads, so two tasks
local[*] = as many threads as the number cores it can detect through the
operating system.


Test 1: When you dont specify master in spark-submit, it uses local[*]
implicitly, so it uses as many threads as the number of cores that VM has.
Between 1 and 2 VM cores, the behavior was as expected.
Test 2: When you specified master as local[2], it used two threads.

HTH

TD

On Wed, Jul 8, 2015 at 4:21 AM, Aniruddh Sharma <asharma.gd@gmail.com>
wrote:

> Hi
>
> I am new to Spark. Following is the problem that I am facing
>
> Test 1) I ran a VM on CDH distribution with only 1 core allocated to it
> and I ran simple Streaming example in spark-shell with sending data on 7777
> port and trying to read it. With 1 core allocated to this nothing happens
> in my streaming program and it does not receive data. Now I restart VM with
> 2 cores allocated to it and start spark-shell again and ran Streaming
> example again and this time it works
>
> Query a): From this test I concluded that Receiver in Streaming will
> occupy the core completely even though I am using very less data and it
> does not need complete core for same
> but it does not assign this core to Executor for calculating
> transformation.  And doing comparison of Partition processing and Receiver
> processing is that in case of Partitions same
> physical cores can parallelly process multiple partitions but Receiver
> will not allow its core to process anything else. Is this understanding
> correct
>
> Test2) Now I restarted VM with 1 core again and start spark-shell --master
> local[2]. I have allocated only 1 core to VM but i say to spark-shell to
> use 2 cores. and I test streaming program again and it somehow works.
>
> Query b) Now I am more confused and I dont understand when I have only 1
> core for VM. I thought previously it did not work because it had only 1
> core and Receiver is completely blocking it and not sharing it with
> Executor. But when I do start with local[2] and still having only 1 core to
> VM it works. So it means that Receiver and Executor are both getting same
> physical CPU. Request you to explain how is it different in this case and
> what conclusions shall I draw in context of physical CPU usage.
>
> Thanks and Regards
> Aniruddh
>
>

Mime
View raw message