spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Prus <>
Subject State of datasource api v2
Date Mon, 14 Jan 2019 08:48:16 GMT

I am trying to understand the state of datasource v2, and I'm a bit lost.
On one hand, it is supposed to be more flexible approach, as described for
example here:

On another hand, it appears both Parquet and ORC file readers are still not
using v2 interface. There's an umbrella issue to address that:

but it does not have any sub-issues to address Parquet and the issue about

includes this text: "Not supported( due to limitation of data source V2):
(1) Read multiple file path (2) Read bucketed file.".

Is there some up-to-date information whether datasource v2 will indeed
become to primary datasource, whether parquet reader
will be converted to V2, and whether these limitations above will be fixed.

Thanks in advance,

Vladimir Prus

View raw message