spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Somasundaram Sekar <somasundar.se...@tigeranalytics.com>
Subject Re: Importing large file with SparkContext.textFile
Date Sat, 03 Sep 2016 14:23:52 GMT
If the file is not splittable(can I assume the log file is splittable,
though) can you advise on how spark handles such caseā€¦? If Spark can't what
is the widely used practice?

On 3 Sep 2016 7:29 pm, "Raghavendra Pandey" <raghavendra.pandey@gmail.com>
wrote:

If your file format is splittable say TSV, CSV etc, it will be distributed
across all executors.

On Sat, Sep 3, 2016 at 3:38 PM, Somasundaram Sekar <somasundar.sekar@
tigeranalytics.com> wrote:

> Hi All,
>
>
>
> Would like to gain some understanding on the questions listed below,
>
>
>
> 1.       When processing a large file with Apache Spark, with, say,
> sc.textFile("somefile.xml"), does it split it for parallel processing
> across executors or, will it be processed as a single chunk in a single
> executor?
>
> 2.       When using dataframes, with implicit XMLContext from Databricks
> is there any optimization prebuilt for such large file processing?
>
>
>
> Please help!!!
>
>
>
> http://stackoverflow.com/questions/39305310/does-spark-proce
> ss-large-file-in-the-single-worker
>
>
>
> Regards,
>
> Somasundaram S
>

Mime
View raw message