spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From habibbaluwala <>
Subject Streaming audio files
Date Mon, 05 Dec 2016 22:27:36 GMT
I have a HDFS folder that keeps on getting new audio files every few minutes.
My objective is to detect new files that have been added to the folder, and
then process the files in parallel without splitting it into multiple
blocks. Basically, if there are 4 new audio files added, I want the Spark
engine to detect the four files names/locations and then I can provide the
four file locations and it can use four processors to process each file. 

I tried using FileStream but there I would have to split the files into
blocks, which I do not want.  Is there any other solution ?   

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe e-mail:

View raw message