spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eron Wright <>
Subject [SPARK-8794] [SQL] PrunedScan problem
Date Thu, 02 Jul 2015 16:03:04 GMT
I filed an issue due to an issue I see with PrunedScan, that causes sub-optimal performance
in ML pipelines.   
Sorry if the issue is already known.
Having tried a few approaches to working with large binary files with Spark ML, I prefer loading
the data into a vector-type column from a relation supporting pruned scan.  This is better,
I think, than a lazy-loading scheme based on binaryFiles/PortalDataStream.   SPARK-8794 undermines
the approach.
View raw message