spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <guha.a...@gmail.com>
Subject Re: Reading from HDFS by increasing split size
Date Tue, 10 Oct 2017 11:32:30 GMT
I have not tested this, but you should be able to pass on any map-reduce
like conf to underlying hadoop config.....essentially you should be able to
control behaviour of split as you can do in a map-reduce program (as Spark
uses the same input format)

On Tue, Oct 10, 2017 at 10:21 PM, Jörn Franke <jornfranke@gmail.com> wrote:

> Write your own input format/datasource or split the file yourself
> beforehand (not recommended).
>
> > On 10. Oct 2017, at 09:14, Kanagha Kumar <kprasad@salesforce.com> wrote:
> >
> > Hi,
> >
> > I'm trying to read a 60GB HDFS file using spark
> textFile("hdfs_file_path", minPartitions).
> >
> > How can I control the no.of tasks by increasing the split size? With
> default split size of 250 MB, several tasks are created. But I would like
> to have a specific no.of tasks created while reading from HDFS itself
> instead of using repartition() etc.,
> >
> > Any suggestions are helpful!
> >
> > Thanks
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>


-- 
Best Regards,
Ayan Guha

Mime
View raw message