spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ch...@cmartinit.co.uk
Subject Datasource V2- Heavy Metadata Query
Date Thu, 23 Apr 2020 06:13:19 GMT
Hi,

We have a datasource V2 implementation for one of our custom data sources. In this we have
a step that is almost completely analogous to scanning parquet files- essentially it’s a
heavyweight metadata operation that, although not actually a file scan, is best done in parallel
and ideally is cached for the lifetime of the dataframe. 

In the case of parquet files Spark solves this issue via FileScanRDD. For Datasource V2 it’s
not obvious how one would solve a similar problem.  Does anyone have any ideas or prior art
here? 

Thanks,

Chris
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message