spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <guha.a...@gmail.com>
Subject Re: How to read the schema of a partitioned dataframe without listing all the partitions ?
Date Fri, 27 Apr 2018 11:57:30 GMT
You can specify the first folder directly and read it

On Fri, 27 Apr 2018 at 9:42 pm, Walid LEZZAR <walezz89@gmail.com> wrote:

> Hi,
>
> I have a parquet on S3 partitioned by day. I have 2 years of data (->
> about 1000 partitions). With spark, when I just want to know the schema of
> this parquet without even asking for a single row of data, spark tries to
> list all the partitions and the nested partitions of the parquet. Which
> makes it very slow just to build the dataframe object on Zeppelin.
>
> Is there a way to avoid that ? Is there way to tell spark : "hey, just
> read a single partition and give me the schema of that partition and
> consider it as the schema of the whole dataframe" ? (I don't care about
> schema merge, it's off by the way)
>
> Thanks.
> Walid.
>
-- 
Best Regards,
Ayan Guha

Mime
View raw message