spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Gao <todd.gao.2013+sp...@gmail.com>
Subject Re: CallbackServer in PySpark Streaming
Date Thu, 12 Feb 2015 01:44:37 GMT
Thanks Davies.
I am not quite familiar with Spark Streaming. Do you mean that the compute
routine of DStream is only invoked in the driver node,
while only the compute routines of RDD are distributed to the slaves?

On Thu, Feb 12, 2015 at 2:38 AM, Davies Liu <davies@databricks.com> wrote:

> The CallbackServer is part of Py4j, it's only used in driver, not used
> in slaves or workers.
>
> On Wed, Feb 11, 2015 at 12:32 AM, Todd Gao
> <todd.gao.2013+spark@gmail.com> wrote:
> > Hi all,
> >
> > I am reading the code of PySpark and its Streaming module.
> >
> > In PySpark Streaming, when the `compute` method of the instance of
> > PythonTransformedDStream is invoked, a connection to the CallbackServer
> > is created internally.
> > I wonder where is the CallbackServer for each PythonTransformedDStream
> > instance on the slave nodes in distributed environment.
> > Is there a CallbackServer running on every slave node?
> >
> > thanks
> > Todd
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message