spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <guha.a...@gmail.com>
Subject Re: Reading from HDFS by increasing split size
Date Tue, 10 Oct 2017 19:25:28 GMT
Have you seen this:
https://stackoverflow.com/questions/42796561/set-hadoop-configuration-values-on-spark-submit-command-line
? Please try and let us know.

On Wed, Oct 11, 2017 at 2:53 AM, Kanagha Kumar <kprasad@salesforce.com>
wrote:

> Thanks for the inputs!!
>
> I passed in spark.mapred.max.split.size, spark.mapred.min.split.size to
> the size I wanted to read. It didn't take any effect.
> I also tried passing in spark.dfs.block.size, with all the params set to
> the same value.
>
> JavaSparkContext.fromSparkContext(spark.sparkContext()).textFile(hdfsPath,
> 13);
>
> Is there any other param that needs to be set as well?
>
> Thanks
>
> On Tue, Oct 10, 2017 at 4:32 AM, ayan guha <guha.ayan@gmail.com> wrote:
>
>> I have not tested this, but you should be able to pass on any map-reduce
>> like conf to underlying hadoop config.....essentially you should be able to
>> control behaviour of split as you can do in a map-reduce program (as Spark
>> uses the same input format)
>>
>> On Tue, Oct 10, 2017 at 10:21 PM, Jörn Franke <jornfranke@gmail.com>
>> wrote:
>>
>>> Write your own input format/datasource or split the file yourself
>>> beforehand (not recommended).
>>>
>>> > On 10. Oct 2017, at 09:14, Kanagha Kumar <kprasad@salesforce.com>
>>> wrote:
>>> >
>>> > Hi,
>>> >
>>> > I'm trying to read a 60GB HDFS file using spark
>>> textFile("hdfs_file_path", minPartitions).
>>> >
>>> > How can I control the no.of tasks by increasing the split size? With
>>> default split size of 250 MB, several tasks are created. But I would like
>>> to have a specific no.of tasks created while reading from HDFS itself
>>> instead of using repartition() etc.,
>>> >
>>> > Any suggestions are helpful!
>>> >
>>> > Thanks
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>
>
> --
>
>
> <http://smart.salesforce.com/sig/kprasad//us_mb/default/link.html>
>



-- 
Best Regards,
Ayan Guha

Mime
View raw message