spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin East <robin.e...@xense.co.uk>
Subject Re: What is the interpretation of Cores in Spark doc
Date Fri, 17 Jun 2016 11:01:20 GMT
Agreed it’s a worthwhile discussion (and interesting IMO)

This is a section from your original post:

> It is about the terminology or interpretation of that in Spark doc.
> 
> This is my understanding of cores and threads.
> 
>  Cores are physical cores. Threads are virtual cores.

At least as far as Spark doc is concerned Threads are not synonymous with virtual cores; they
are closely related concepts of course. So any time we want to have a discussion about architecture,
performance, tuning, configuration etc we do need to be clear about the concepts and how they
are defined.

Granted CPU hardware implementation can also refer to ’threads’. In fact Oracle/Sun seem
unclear as to what they mean by thread - in various documents they define threads as:

A software entity that can be executed on hardware (e.g. Oracle SPARC Architecture 2011)

At other times as:

A thread is a hardware strand. Each thread, or strand, enjoys a unique set of resources in
support of its … (e.g. OpenSPARC T1 Microarchitecture Specification)

So unless the documentation you are writing is very specific to your environment, and the
idea that a thread is a logical processor is generally accepted, I would not be inclined to
treat threads as if they are logical processors.



> On 16 Jun 2016, at 15:45, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:
> 
> Thanks all.
> 
> I think we are diverging but IMO it is a worthwhile discussion
> 
> Actually, threads are a hardware implementation - hence the whole notion of “multi-threaded
cores”.   What happens is that the cores often have duplicate registers, etc. for holding
execution state.   While it is correct that only a single process is executing at a time,
a single core will have execution states of multiple processes preserved in these registers.
In addition, it is the core (not the OS) that determines when the thread is executed. The
approach often varies according to the CPU manufacturer, but the most simple approach is when
one thread of execution executes a multi-cycle operation (e.g. a fetch from main memory, etc.),
the core simply stops processing that thread saves the execution state to a set of registers,
loads instructions from the other set of registers and goes on.  On the Oracle SPARC chips,
it will actually check the next thread to see if the reason it was ‘parked’ has completed
and if not, skip it for the subsequent thread. The OS is only aware of what are cores and
what are logical processors - and dispatches accordingly.  Execution is up to the cores. .
> Cheers
> 
> 
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> 
> On 16 June 2016 at 13:02, Robin East <robin.east@xense.co.uk <mailto:robin.east@xense.co.uk>>
wrote:
> Mich
> 
> >> A core may have one or more threads
> It would be more accurate to say that a core could run one or more threads scheduled
for execution. Threads are a software/OS concept that represent executable code that is scheduled
to run by the OS; A CPU, core or virtual core/virtual processor execute that code. Threads
are not CPUs or cores whether physical or logical - any Spark documentation that implies this
is mistaken. I’ve looked at the documentation you mention and I don’t read it to mean
that threads are logical processors.
> 
> To go back to your original question, if you set local[6] and you have 12 logical processors
then you are likely to have half your CPU resources unused by Spark.
> 
> 
>> On 15 Jun 2016, at 23:08, Mich Talebzadeh <mich.talebzadeh@gmail.com <mailto:mich.talebzadeh@gmail.com>>
wrote:
>> 
>> I think it is slightly more than that.
>> 
>> These days  software is licensed by core (generally speaking).   That is the physical
processor.    A core may have one or more threads - or logical processors. Virtualization
adds some fun to the mix.   Generally what they present is ‘virtual processors’.   What
that equates to depends on the virtualization layer itself.   In some simpler VM’s - it
is virtual=logical.   In others, virtual=logical but they are constrained to be from the same
cores - e.g. if you get 6 virtual processors, it really is 3 full cores with 2 threads each.
  Rational is due to the way OS dispatching works on ‘logical’ processors vs. cores and
POSIX threaded applications.
>> 
>> HTH
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> 
>> On 13 June 2016 at 18:17, Mark Hamstra <mark@clearstorydata.com <mailto:mark@clearstorydata.com>>
wrote:
>> I don't know what documentation you were referring to, but this is clearly an erroneous
statement: "Threads are virtual cores."  At best it is terminology abuse by a hardware manufacturer.
 Regardless, Spark can't get too concerned about how any particular hardware vendor wants
to refer to the specific components of their CPU architecture.  For us, a core is a logical
execution unit, something on which a thread of execution can run.  That can map in different
ways to different physical or virtual hardware. 
>> 
>> On Mon, Jun 13, 2016 at 12:02 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com <mailto:mich.talebzadeh@gmail.com>>
wrote:
>> Hi,
>> 
>> It is not the issue of testing anything. I was referring to documentation that clearly
use the term "threads". As I said and showed before, one line is using the term "thread" and
the next one "logical cores".
>> 
>> 
>> HTH
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> 
>> On 12 June 2016 at 23:57, Daniel Darabos <daniel.darabos@lynxanalytics.com <mailto:daniel.darabos@lynxanalytics.com>>
wrote:
>> Spark is a software product. In software a "core" is something that a process can
run on. So it's a "virtual core". (Do not call these "threads". A "thread" is not something
a process can run on.)
>> 
>> local[*] uses java.lang.Runtime.availableProcessors() <https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2608>.
Since Java is software, this also returns the number of virtual cores. (You can test this
easily.)
>> 
>> 
>> On Sun, Jun 12, 2016 at 9:23 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com <mailto:mich.talebzadeh@gmail.com>>
wrote:
>> 
>> Hi,
>> 
>> I was writing some docs on Spark P&T and came across this.
>> 
>> It is about the terminology or interpretation of that in Spark doc.
>> 
>> This is my understanding of cores and threads.
>> 
>>  Cores are physical cores. Threads are virtual cores. Cores with 2 threads is called
hyper threading technology so 2 threads per core makes the core work on two loads at same
time. In other words, every thread takes care of one load.
>> 
>> Core has its own memory. So if you have a dual core with hyper threading, the core
works with 2 loads each at same time because of the 2 threads per core, but this 2 threads
will share memory in that core.
>> 
>> Some vendors as I am sure most of you aware charge licensing per core.
>> 
>> For example on the same host that I have Spark, I have a SAP product that checks
the licensing and shuts the application down if the license does not agree with the cores
speced.
>> 
>> This is what it says
>> 
>> ./cpuinfo
>> License hostid:        00e04c69159a 0050b60fd1e7
>> Detected 12 logical processor(s), 6 core(s), in 1 chip(s)
>> 
>> So here I have 12 logical processors  and 6 cores and 1 chip. I call logical processors
as threads so I have 12 threads?
>> 
>> Now if I go and start worker process ${SPARK_HOME}/sbin/start-slaves.sh, I see this
in GUI page
>> 
>> <image.png>
>> 
>> it says 12 cores but I gather it is threads?
>> 
>> Spark document <http://spark.apache.org/docs/latest/submitting-applications.html>
states and I quote
>> 
>> <image.png>
>> 
>> 
>> OK the line local[k] adds  ..  set this to the number of cores on your machine
>> 
>> But I know that it means threads. Because if I went and set that to 6, it would be
only 6 threads as opposed to 12 threads.
>> 
>> the next line local[*] seems to indicate it correctly as it refers to "logical cores"
that in my understanding it is threads.
>> 
>> I trust that I am not nitpicking here!
>> 
>> Cheers,
>> 
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> 
>> 
>> 
>> 
> 
> 


Mime
View raw message