spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tathagata Das (JIRA)" <>
Subject [jira] [Resolved] (SPARK-15458) Disable schema inference for streaming datasets on file streams
Date Tue, 24 May 2016 21:29:12 GMT


Tathagata Das resolved SPARK-15458.
       Resolution: Fixed
    Fix Version/s: 2.0.0

Issue resolved by pull request 13238

> Disable schema inference for streaming datasets on file streams
> ---------------------------------------------------------------
>                 Key: SPARK-15458
>                 URL:
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>             Fix For: 2.0.0
> If the user relies on the schema to be inferred in file streams can break easily for
multiple reasons
> - accidentally running on a directory which has no data
> - schema changing underneath
> - on restart, the query will infer schema again, and may unexpectedly infer incorrect
schema, as the file in the directory may be different at the time of the restart.
> To avoid these complicated scenarios, for Spark 2.0, we are going to disable schema inferencing
by default with a config, so that user is forced to consider explicitly what is the schema
it wants, rather than the system trying to infer it and run into weird corner cases.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message