spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gil Vernik <G...@il.ibm.com>
Subject parquet support - some questions about code
Date Wed, 18 Mar 2015 13:46:21 GMT
Hi,

I am trying to better understand the code for  Parquet support.
In particular i got lost trying to understand ParquetRelation and 
ParquetRelation2. Does ParquetRelation2 is the new code that should 
completely remove ParquetRelation? ( I think there is some remark in the 
code notifying this )

Assuming i am using 
spark.sql.parquet.filterPushdown = true
spark.sql.parquet.useDataSourceApi = true

I saw that method buildScan from newParquet.scala has filtering push down 
into Parquet, but i also saw that there is filtering and projection push 
down from ParquetOperations inside SparkStrategies.scala
However every time i debug it, the 
 object ParquetOperations extends Strategy {
    def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
..........
Never evaluated to  case PhysicalOperation(projectList, filters: 
Seq[Expression], relation: ParquetRelation) =>

In which cases it will match this case?

Also, where is the code for Parquet projection and filter push down, is it 
inside ParquetOperations in SparkStrategies.scala or inside buildScan of 
newParquet.scala? Or both? If so i am not sure how it works...

Thanks,
Gil.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message