spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evo Eftimov" <evo.efti...@isecc.com>
Subject RE: Spark Streaming for Each RDD - Exception on Empty
Date Fri, 05 Jun 2015 16:01:50 GMT
Foreachpartition callback is provided with Iterator by the Spark Frameowrk – while iterator.hasNext()
……

 

Also check whether this is not some sort of Python Spark API bug – Python seems to be the
foster child here – Scala and Java are the darlings

 

From: John Omernik [mailto:john@omernik.com] 
Sent: Friday, June 5, 2015 4:08 PM
To: user
Subject: Spark Streaming for Each RDD - Exception on Empty

 

Is there pythonic/sparkonic way to test for an empty RDD before using the foreachRDD?  Basically
I am using the Python example https://spark.apache.org/docs/latest/streaming-programming-guide.html
to "put records somewhere"  When I have data, it works fine, when I don't I get an exception.
I am not sure about the performance implications of just throwing an exception every time
there is no data, but can I just test before sending it?

 

I did see one post mentioning look for take(1) from the stream to test for data, but I am
not sure where I put that in this example... Is that in the lambda function? or somewhere
else? Looking for pointers!

Thanks!

 

 

 

mydstream.foreachRDD(lambda rdd: rdd.foreachPartition(parseRDD))

 

 

Using this example code from the link above:

 

def sendPartition(iter):
    connection = createNewConnection()
    for record in iter:
        connection.send(record)
    connection.close()
 
dstream.foreachRDD(lambda rdd: rdd.foreachPartition(sendPartition))

Mime
View raw message