spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evo Eftimov" <>
Subject RE: Spark Streaming for Each RDD - Exception on Empty
Date Fri, 05 Jun 2015 16:01:50 GMT
Foreachpartition callback is provided with Iterator by the Spark Frameowrk – while iterator.hasNext()


Also check whether this is not some sort of Python Spark API bug – Python seems to be the
foster child here – Scala and Java are the darlings


From: John Omernik [] 
Sent: Friday, June 5, 2015 4:08 PM
To: user
Subject: Spark Streaming for Each RDD - Exception on Empty


Is there pythonic/sparkonic way to test for an empty RDD before using the foreachRDD?  Basically
I am using the Python example
to "put records somewhere"  When I have data, it works fine, when I don't I get an exception.
I am not sure about the performance implications of just throwing an exception every time
there is no data, but can I just test before sending it?


I did see one post mentioning look for take(1) from the stream to test for data, but I am
not sure where I put that in this example... Is that in the lambda function? or somewhere
else? Looking for pointers!





mydstream.foreachRDD(lambda rdd: rdd.foreachPartition(parseRDD))



Using this example code from the link above:


def sendPartition(iter):
    connection = createNewConnection()
    for record in iter:
dstream.foreachRDD(lambda rdd: rdd.foreachPartition(sendPartition))

View raw message