spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <guha.a...@gmail.com>
Subject Re: Spark S3
Date Mon, 10 Oct 2016 22:06:34 GMT
It really depends on the input format used.
On 11 Oct 2016 08:46, "Selvam Raman" <selmna@gmail.com> wrote:

> Hi,
>
> How spark reads data from s3 and runs parallel task.
>
> Assume I have a s3 bucket size of 35 GB( parquet file).
>
> How the sparksession will read the data and process the data parallel. How
> it splits the s3 data and assign to each executor task.
>
> ​Please share me your points.
>
> Note:
> if we have RDD , then we can look at the partitions.size or length to
> check how many partition for a file. But how this will be accomplished in
> terms of S3 bucket.​
>
> --
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>

Mime
View raw message