spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengu...@gmail.com>
Subject Re: SPARK CSV ISSUE
Date Sat, 09 Sep 2017 06:20:14 GMT
Hi,

Naga has kindly suggested here that I should push the file into RDD and get
rid of header. But my partitions have hundreds of files in it and just
opening and processing the files using RDD is a way old method of working.
I think that SPARK community has moved on from RDD, to Dataframes to
Datasets now.

I know for special cases we still need RDD, but for a CSV file in case we
are asked to use RDD in order to just avoid the header then it does not
sound quite right for me.



Regards,
Gourav Sengupta

On Fri, Sep 8, 2017 at 7:25 PM, Gourav Sengupta <gourav.sengupta@gmail.com>
wrote:

> Hi,
>
> According to this thread https://issues.apache.org/jira/browse/SPARK-11374.
> SPARK will not resolve the issue of skipping header option when the table
> is defined in HIVE.
>
> But I am unable to see a SPARK SQL option for setting up external
> partitioned table.
>
> Does that mean in case I have to create an external partitioned table I
> must use HIVE and when I use HIVE SPARK does not allow me to ignore the
> headers?
>
>
> Regards,
> Gourav Sengupta
>

Mime
View raw message