spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-12760) inaccurate description for difference between local vs cluster mode in closure handling
Date Tue, 12 Jan 2016 11:01:39 GMT

     [ https://issues.apache.org/jira/browse/SPARK-12760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen updated SPARK-12760:
------------------------------
      Priority: Minor  (was: Trivial)
    Issue Type: Bug  (was: Question)
       Summary: inaccurate description for difference between local vs cluster mode in closure
handling  (was: inaccurate description for difference between local vs cluster mode )

I think the example needs an update, but not for this reason. There's no separate "memory
space" in local mode. It's one JVM. However it's undefined whether the copy of {{counter}}
is the same or different in this case. Actually, I find a copy is serialized with the closure
at this point so the result is still 0.

I think the explanation should be changed to say the result is undefined here, and could be
0 or not, and explain why. Do you want to try a PR?

> inaccurate description for difference between local vs cluster mode in closure handling
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-12760
>                 URL: https://issues.apache.org/jira/browse/SPARK-12760
>             Project: Spark
>          Issue Type: Bug
>          Components: Documentation
>            Reporter: Mortada Mehyar
>            Priority: Minor
>
> In the spark documentation there's an example for illustrating how `local` and `cluster`
mode can differ http://spark.apache.org/docs/latest/programming-guide.html#example
> " In local mode with a single JVM, the above code will sum the values within the RDD
and store it in counter. This is because both the RDD and the variable counter are in the
same memory space on the driver node." 
> However the above doesn't seem to be true. Even in `local` mode it seems like the counter
value should still be 0, because the variable will be summed up in the executor memory space,
but the final value in the driver memory space is still 0. I tested this snippet and verified
that in `local` mode the value is indeed still 0. 
> Is the doc wrong or perhaps I'm missing something the doc is trying to say? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message