spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tathagata Das (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-15458) Disable schema inference for streaming datasets on file streams
Date Tue, 24 May 2016 21:29:12 GMT

     [ https://issues.apache.org/jira/browse/SPARK-15458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tathagata Das resolved SPARK-15458.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: 2.0.0

Issue resolved by pull request 13238
[https://github.com/apache/spark/pull/13238]

> Disable schema inference for streaming datasets on file streams
> ---------------------------------------------------------------
>
>                 Key: SPARK-15458
>                 URL: https://issues.apache.org/jira/browse/SPARK-15458
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>             Fix For: 2.0.0
>
>
> If the user relies on the schema to be inferred in file streams can break easily for
multiple reasons
> - accidentally running on a directory which has no data
> - schema changing underneath
> - on restart, the query will infer schema again, and may unexpectedly infer incorrect
schema, as the file in the directory may be different at the time of the restart.
> To avoid these complicated scenarios, for Spark 2.0, we are going to disable schema inferencing
by default with a config, so that user is forced to consider explicitly what is the schema
it wants, rather than the system trying to infer it and run into weird corner cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message