spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Reading from HDFS by increasing split size
Date Tue, 10 Oct 2017 11:21:11 GMT
Write your own input format/datasource or split the file yourself beforehand (not recommended).

> On 10. Oct 2017, at 09:14, Kanagha Kumar <kprasad@salesforce.com> wrote:
> 
> Hi,
> 
> I'm trying to read a 60GB HDFS file using spark textFile("hdfs_file_path", minPartitions).

> 
> How can I control the no.of tasks by increasing the split size? With default split size
of 250 MB, several tasks are created. But I would like to have a specific no.of tasks created
while reading from HDFS itself instead of using repartition() etc.,
> 
> Any suggestions are helpful!
> 
> Thanks
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message