spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manu Zhang <owenzhang1...@gmail.com>
Subject Re: mapreduce.input.fileinputformat.split.maxsize not working for spark 2.4.0
Date Mon, 25 Feb 2019 02:24:05 GMT
Is your application using Spark SQL / DataFrame API ? Is so, please try setting

spark.sql.files.maxPartitionBytes

to a larger value which is 128MB by default.

Thanks,
Manu Zhang
On Feb 25, 2019, 2:58 AM +0800, Akshay Mendole <akshaymendole@gmail.com>, wrote:
> Hi,
>    We have dfs.blocksize configured to be 512MB  and we have some large files in hdfs
that we want to process with spark application. We want to split the files get more splits
to optimise for memory but the above mentioned parameters are not working
> The max and min size params as below are configured to be 50MB still a file which is
as big as 500MB is read as one split while it is expected to split into at least 10 input
splits
> SparkConf conf = new SparkConf().setAppName(jobName);
>
> SparkContext sparkContext = new SparkContext(conf);
> sparkContext.hadoopConfiguration().set("mapreduce.input.fileinputformat.split.maxsize",
"50000000");
> sparkContext.hadoopConfiguration().set("mapreduce.input.fileinputformat.split.minsize",
"50000000");
> JavaSparkContext sc = new JavaSparkContext(sparkContext);
> sc.hadoopConfiguration().set("io.compression.codecs", "com.hadoop.compression.lzo.LzopCodec");
>
> Could you please suggest what could be wrong with my configuration?
>
> Thanks,
> Akshay
>

Mime
View raw message