spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rachana Srivastava <>
Subject How do we process/scale variable size batches in Apache Spark Streaming
Date Tue, 23 Aug 2016 22:20:53 GMT
I am running a spark streaming process where I am getting batch of data after n seconds. I
am using repartition to scale the application. Since the repartition size is fixed we are
getting lots of small files when batch size is very small. Is there anyway I can change the
partitioner logic based on the input batch size in order to avoid lots of small files.

View raw message