spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Spark Streaming for Each RDD - Exception on Empty
Date Fri, 05 Jun 2015 16:07:44 GMT
John:
Which Spark release are you using ?
As of 1.4.0, RDD has this method:

  def isEmpty(): Boolean = withScope {

FYI

On Fri, Jun 5, 2015 at 9:01 AM, Evo Eftimov <evo.eftimov@isecc.com> wrote:

> Foreachpartition callback is provided with Iterator by the Spark Frameowrk
> – while iterator.hasNext() ……
>
>
>
> Also check whether this is not some sort of Python Spark API bug – Python
> seems to be the foster child here – Scala and Java are the darlings
>
>
>
> *From:* John Omernik [mailto:john@omernik.com]
> *Sent:* Friday, June 5, 2015 4:08 PM
> *To:* user
> *Subject:* Spark Streaming for Each RDD - Exception on Empty
>
>
>
> Is there pythonic/sparkonic way to test for an empty RDD before using the
> foreachRDD?  Basically I am using the Python example
> https://spark.apache.org/docs/latest/streaming-programming-guide.html to
> "put records somewhere"  When I have data, it works fine, when I don't I
> get an exception. I am not sure about the performance implications of just
> throwing an exception every time there is no data, but can I just test
> before sending it?
>
>
>
> I did see one post mentioning look for take(1) from the stream to test for
> data, but I am not sure where I put that in this example... Is that in the
> lambda function? or somewhere else? Looking for pointers!
>
> Thanks!
>
>
>
>
>
>
>
> mydstream.foreachRDD(lambda rdd: rdd.foreachPartition(parseRDD))
>
>
>
>
>
> Using this example code from the link above:
>
>
>
> *def* sendPartition(iter):
>
>     connection = createNewConnection()
>
>     *for* record *in* iter:
>
>         connection.send(record)
>
>     connection.close()
>
>
>
> dstream.foreachRDD(*lambda* rdd: rdd.foreachPartition(sendPartition))
>
>

Mime
View raw message