spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-17858) Provide option for Spark SQL to skip corrupt files
Date Tue, 11 Oct 2016 05:04:20 GMT

    [ https://issues.apache.org/jira/browse/SPARK-17858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15564479#comment-15564479
] 

Sean Owen commented on SPARK-17858:
-----------------------------------

Yeah, the related JIRA gives an argument that we shouldn't do this. You end up more easily
silently ignoring data if it doesn't fail the query. I'm not that sure this is a good idea.

> Provide option for Spark SQL to skip corrupt files
> --------------------------------------------------
>
>                 Key: SPARK-17858
>                 URL: https://issues.apache.org/jira/browse/SPARK-17858
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: Shixiong Zhu
>
> In Spark 2.0, corrupt files will fail a SQL query. However, the user may just want to
skip corrupt files and still run the query.
> Another painful thing is the current exception doesn't contain the paths of corrupt files,
makes the user hard to fix their files.
> Note: In Spark 1.6, Spark SQL always skip corrupt files because of SPARK-17850.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message