spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kant kodali <kanth...@gmail.com>
Subject How to control batch size while reading from hdfs files?
Date Sat, 23 Mar 2019 02:02:05 GMT
Hi All,

What determines the batch size while reading from a file from HDFS?

I am trying to read files from HDFS and ingest into Kafka using Spark
Structured Streaming 2.3.1. I get an error sayiKafkafka batch size is too
big and that I need to increase max.request.size. Sure I can increase it
but I would like to know what other parameters I can change such that I
don't have to change the default max.request.size?

The default max.request.size of a Kafka producer docs says it is set to
1MB.
And each file I have in HDFS is < 12MB.

Any suggestions will be great.

Thanks!

Mime
View raw message