spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <>
Subject Re: How to read the schema of a partitioned dataframe without listing all the partitions ?
Date Fri, 27 Apr 2018 11:57:30 GMT
You can specify the first folder directly and read it

On Fri, 27 Apr 2018 at 9:42 pm, Walid LEZZAR <> wrote:

> Hi,
> I have a parquet on S3 partitioned by day. I have 2 years of data (->
> about 1000 partitions). With spark, when I just want to know the schema of
> this parquet without even asking for a single row of data, spark tries to
> list all the partitions and the nested partitions of the parquet. Which
> makes it very slow just to build the dataframe object on Zeppelin.
> Is there a way to avoid that ? Is there way to tell spark : "hey, just
> read a single partition and give me the schema of that partition and
> consider it as the schema of the whole dataframe" ? (I don't care about
> schema merge, it's off by the way)
> Thanks.
> Walid.
Best Regards,
Ayan Guha

View raw message