spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akshay Mendole <akshaymend...@gmail.com>
Subject Re: mapreduce.input.fileinputformat.split.maxsize not working for spark 2.4.0
Date Thu, 07 Mar 2019 14:47:45 GMT
Hi,
     No. It's a java application that uses RDD APIs.
Thanks,
Akshay


On Mon, Feb 25, 2019 at 7:54 AM Manu Zhang <owenzhang1990@gmail.com> wrote:

> Is your application using Spark SQL / DataFrame API ? Is so, please try
> setting
>
> spark.sql.files.maxPartitionBytes
>
> to a larger value which is 128MB by default.
>
> Thanks,
> Manu Zhang
> On Feb 25, 2019, 2:58 AM +0800, Akshay Mendole <akshaymendole@gmail.com>,
> wrote:
>
> Hi,
>    We have dfs.blocksize configured to be 512MB  and we have some large
> files in hdfs that we want to process with spark application. We want to
> split the files get more splits to optimise for memory but the above
> mentioned parameters are not working
> The max and min size params as below are configured to be 50MB still a
> file which is as big as 500MB is read as one split while it is expected to
> split into at least 10 input splits
>
> SparkConf conf = new SparkConf().setAppName(jobName);
>
> SparkContext sparkContext = new SparkContext(conf);
> sparkContext.hadoopConfiguration().set("mapreduce.input.fileinputformat.split.maxsize",
"50000000");
> sparkContext.hadoopConfiguration().set("mapreduce.input.fileinputformat.split.minsize",
"50000000");
> JavaSparkContext sc = new JavaSparkContext(sparkContext);
> sc.hadoopConfiguration().set("io.compression.codecs", "com.hadoop.compression.lzo.LzopCodec");
>
>
> Could you please suggest what could be wrong with my configuration?
>
> Thanks,
> Akshay
>
>

Mime
View raw message