spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jascha Swisher (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-5037) support dynamic loading of input DStreams in pyspark streaming
Date Wed, 31 Dec 2014 17:51:13 GMT

     [ https://issues.apache.org/jira/browse/SPARK-5037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jascha Swisher updated SPARK-5037:
----------------------------------
    Description: 
The scala and java streaming APIs support "external" InputDStreams (e.g. the ZeroMQReceiver
example) through a number of mechanisms, for instance by overriding ActorReceiver or just
subclassing Receiver directly. The pyspark streaming API does not currently allow similar
flexibility, being limited at the moment to file-backed text and binary streams or socket
text streams.

It would be great to open up the pyspark streaming API to other stream sources, putting it
closer to on par with the JVM APIs.

One way of doing this could be to support dynamically loading InputDStream implementations
through reflection at the JVM level, analogously to what is currently done for Hadoop InputFormats
in the regular pyspark context.py Hadoop methods. 

I'll submit a PR momentarily with my shot at this. Comments and alternative approaches more
than welcome.

  was:
The scala and java streaming APIs support "external" InputDStreams (e.g. the ZeroMQReceiver
example) through a number of mechanisms, for instance by overriding ActorReceiver or just
subclassing Receiver directly. The pyspark streaming API does not currently allow similar
flexibility, being limited at the moment to file-backed text and binary streams or socket
text streams.

It would be great to open up the pyspark streaming API to other stream sources, putting it
closer to on par with the JVM APIs.

One way of doing this could be to support dynamically loading InputDStream implementations
through reflection at the JVM level, analogously to what is currently done for Hadoop InputFormats
in the regular pyspark context.py *Hadoop* methods. 

I'll submit a PR momentarily with my shot at this. Comments and alternative approaches more
than welcome.


> support dynamic loading of input DStreams in pyspark streaming
> --------------------------------------------------------------
>
>                 Key: SPARK-5037
>                 URL: https://issues.apache.org/jira/browse/SPARK-5037
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark, Streaming
>    Affects Versions: 1.2.0
>            Reporter: Jascha Swisher
>
> The scala and java streaming APIs support "external" InputDStreams (e.g. the ZeroMQReceiver
example) through a number of mechanisms, for instance by overriding ActorReceiver or just
subclassing Receiver directly. The pyspark streaming API does not currently allow similar
flexibility, being limited at the moment to file-backed text and binary streams or socket
text streams.
> It would be great to open up the pyspark streaming API to other stream sources, putting
it closer to on par with the JVM APIs.
> One way of doing this could be to support dynamically loading InputDStream implementations
through reflection at the JVM level, analogously to what is currently done for Hadoop InputFormats
in the regular pyspark context.py Hadoop methods. 
> I'll submit a PR momentarily with my shot at this. Comments and alternative approaches
more than welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message