spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antoine DUBOIS <antoine.dub...@cc.in2p3.fr>
Subject Spark streaming
Date Fri, 17 May 2019 12:28:32 GMT
Hello, 

I've a question regarding a use case. 
I have an ETL using spark and working great. 
I use cephFS mounted on all spark node to store data. 
However one problem I have is that b2zipping + transfer from source to spark storage is really
long. 
I would like to be able to process the file as it's written by chunk of 100MB. 
Is there something like that possible in Spark or do I need to use spark streaming, and if
using spark streaming would it mean my application would need to run as a daemon on the spark
node ? 

Thank you for your help and ideas. 
Antoine 

Mime
View raw message