spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dongjoon Hyun (Jira)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-26709) OptimizeMetadataOnlyQuery does not correctly handle the files with zero record
Date Mon, 02 Mar 2020 19:53:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-26709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dongjoon Hyun updated SPARK-26709:
----------------------------------
    Affects Version/s: 2.1.0

> OptimizeMetadataOnlyQuery does not correctly handle the files with zero record
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-26709
>                 URL: https://issues.apache.org/jira/browse/SPARK-26709
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0, 2.1.3, 2.2.3, 2.3.2, 2.4.0
>            Reporter: Xiao Li
>            Assignee: Gengliang Wang
>            Priority: Blocker
>              Labels: correctness
>             Fix For: 2.3.3, 2.4.1, 3.0.0
>
>
> {code:java}
> import org.apache.spark.sql.functions.lit
> withSQLConf(SQLConf.OPTIMIZER_METADATA_ONLY.key -> "true") {
>   withTempPath { path =>
>     val tabLocation = path.getAbsolutePath
>     val partLocation = new Path(path.getAbsolutePath, "partCol1=3")
>     val df = spark.emptyDataFrame.select(lit(1).as("col1"))
>     df.write.parquet(partLocation.toString)
>     val readDF = spark.read.parquet(tabLocation)
>     checkAnswer(readDF.selectExpr("max(partCol1)"), Row(null))
>     checkAnswer(readDF.selectExpr("max(col1)"), Row(null))
>   }
> }
> {code}
> OptimizeMetadataOnlyQuery has a correctness bug to handle the file with the empty records
for partitioned tables. The above test will fail in 2.4, which can generate an empty file,
but the underlying issue in the read path still exists in 2.3, 2.2 and 2.1. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message