spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yifan Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-6404) Call broadcast() in each interval for spark streaming programs.
Date Thu, 19 Mar 2015 02:27:38 GMT

    [ https://issues.apache.org/jira/browse/SPARK-6404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368383#comment-14368383
] 

Yifan Wang commented on SPARK-6404:
-----------------------------------

I got an error. Is that expected?

{code}
Traceback (most recent call last):
  File "/nail/home/yifan/pg/yifan_spark_mr_steps/spark-1.2.1-bin-hadoop2.4/python/pyspark/streaming/util.py",
line 90, in dumps
    return bytearray(self.serializer.dumps((func.func, func.deserializers)))
  File "/nail/home/yifan/pg/yifan_spark_mr_steps/spark-1.2.1-bin-hadoop2.4/python/pyspark/serializers.py",
line 405, in dumps
    return cloudpickle.dumps(obj, 2)
  File "/nail/home/yifan/pg/yifan_spark_mr_steps/spark-1.2.1-bin-hadoop2.4/python/pyspark/cloudpickle.py",
line 816, in dumps
    cp.dump(obj)
  File "/nail/home/yifan/pg/yifan_spark_mr_steps/spark-1.2.1-bin-hadoop2.4/python/pyspark/cloudpickle.py",
line 133, in dump
    return pickle.Pickler.dump(self, obj)
  File "/usr/lib/python2.6/pickle.py", line 224, in dump
    self.save(obj)
  File "/usr/lib/python2.6/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.6/pickle.py", line 548, in save_tuple
    save(element)
  File "/usr/lib/python2.6/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/nail/home/yifan/pg/yifan_spark_mr_steps/spark-1.2.1-bin-hadoop2.4/python/pyspark/cloudpickle.py",
line 249, in save_function
    self.save_function_tuple(obj, modList)
  File "/nail/home/yifan/pg/yifan_spark_mr_steps/spark-1.2.1-bin-hadoop2.4/python/pyspark/cloudpickle.py",
line 304, in save_function_tuple
    save((code, closure, base_globals))
  File "/usr/lib/python2.6/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.6/pickle.py", line 548, in save_tuple
    save(element)
  File "/usr/lib/python2.6/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.6/pickle.py", line 600, in save_list
    self._batch_appends(iter(obj))
  File "/usr/lib/python2.6/pickle.py", line 636, in _batch_appends
    save(tmp[0])
  File "/usr/lib/python2.6/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/nail/home/yifan/pg/yifan_spark_mr_steps/spark-1.2.1-bin-hadoop2.4/python/pyspark/cloudpickle.py",
line 249, in save_function
    self.save_function_tuple(obj, modList)
  File "/nail/home/yifan/pg/yifan_spark_mr_steps/spark-1.2.1-bin-hadoop2.4/python/pyspark/cloudpickle.py",
line 309, in save_function_tuple
    save(f_globals)
  File "/usr/lib/python2.6/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/nail/home/yifan/pg/yifan_spark_mr_steps/spark-1.2.1-bin-hadoop2.4/python/pyspark/cloudpickle.py",
line 174, in save_dict
    pickle.Pickler.save_dict(self, obj)
  File "/usr/lib/python2.6/pickle.py", line 649, in save_dict
    self._batch_setitems(obj.iteritems())
  File "/usr/lib/python2.6/pickle.py", line 681, in _batch_setitems
    save(v)
  File "/usr/lib/python2.6/pickle.py", line 306, in save
    rv = reduce(self.proto)
  File "/nail/home/yifan/pg/yifan_spark_mr_steps/spark-1.2.1-bin-hadoop2.4/python/pyspark/context.py",
line 236, in __getnewargs__
    "It appears that you are attempting to reference SparkContext from a broadcast "
Exception: It appears that you are attempting to reference SparkContext from a broadcast variable,
action, or transforamtion. SparkContext can only be used on the driver, not in code that it
run on workers. For more information, see SPARK-5063.
{code}

> Call broadcast() in each interval for spark streaming programs.
> ---------------------------------------------------------------
>
>                 Key: SPARK-6404
>                 URL: https://issues.apache.org/jira/browse/SPARK-6404
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Yifan Wang
>
> If I understand it correctly, Spark’s broadcast() function will be called only once
at the beginning of the batch. For streaming applications that need to run for 24/7, it is
often needed to update variables that shared by broadcast() dynamically. It would be ideal
if broadcast() could be called at the beginning of each interval.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message