spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ilove Data <>
Subject Join between DStream and Periodically-Changing-RDD
Date Tue, 09 Jun 2015 14:07:51 GMT

I'm trying to join DStream with interval let say 20s, join with RDD loaded
from HDFS folder which is changing periodically, let say new file is coming
to the folder for every 10 minutes.

How should it be done, considering the HDFS files in the folder is
periodically changing/adding new files? Do RDD automatically detect changes
in HDFS folder as RDD source and automatically reload RDD?


View raw message