spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksander Eskilson (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-17939) Spark-SQL Nullability: Optimizations vs. Enforcement Clarification
Date Fri, 14 Oct 2016 14:49:20 GMT

     [ https://issues.apache.org/jira/browse/SPARK-17939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aleksander Eskilson updated SPARK-17939:
----------------------------------------
    Description: 
The notion of Nullability of of StructFields in DataFrames and Datasets creates some confusion.
As has been pointed out previously [1], Nullability is a hint to the Catalyst optimizer, and
is not meant to be a type-level enforcement. Allowing null fields can also help the reader
successfully parse certain types of more loosely-typed data, like JSON and CSV, where null
values are common, rather than just failing. 

There's already been some movement to clarify the meaning of Nullable in the API, but also
some requests for a (perhaps completely separate) type-level implementation of Nullable that
can act as an enforcement contract.

This bug is logged here to discuss and clarify this issue.

[1] - [https://issues.apache.org/jira/browse/SPARK-11319|https://issues.apache.org/jira/browse/SPARK-11319?focusedCommentId=15014535&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15014535]
[2] - https://github.com/apache/spark/pull/11785

  was:
The notion of Nullability of of StructFields in DataFrames and Datasets creates some confusion.
As has been pointed out previously [1], Nullability is a hint to the Catalyst optimizer, and
is not meant to be a type-level enforcement. Allowing null fields can also help the reader
successfully parse certain types of more loosely-typed data, like JSON and CSV, where null
values are common, rather than just failing. 

There's already been some movement to clarify the meaning of Nullable in the API, but also
some requests for a (perhaps completely separate) type-level implementation of Nullable that
can act as an enforcement contract.

This bug is logged here to discuss and clarify this issue.

[1] - [https://issues.apache.org/jira/browse/SPARK-11319][https://issues.apache.org/jira/browse/SPARK-11319?focusedCommentId=15014535&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15014535]
[2] - https://github.com/apache/spark/pull/11785


> Spark-SQL Nullability: Optimizations vs. Enforcement Clarification
> ------------------------------------------------------------------
>
>                 Key: SPARK-17939
>                 URL: https://issues.apache.org/jira/browse/SPARK-17939
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Aleksander Eskilson
>            Priority: Critical
>
> The notion of Nullability of of StructFields in DataFrames and Datasets creates some
confusion. As has been pointed out previously [1], Nullability is a hint to the Catalyst optimizer,
and is not meant to be a type-level enforcement. Allowing null fields can also help the reader
successfully parse certain types of more loosely-typed data, like JSON and CSV, where null
values are common, rather than just failing. 
> There's already been some movement to clarify the meaning of Nullable in the API, but
also some requests for a (perhaps completely separate) type-level implementation of Nullable
that can act as an enforcement contract.
> This bug is logged here to discuss and clarify this issue.
> [1] - [https://issues.apache.org/jira/browse/SPARK-11319|https://issues.apache.org/jira/browse/SPARK-11319?focusedCommentId=15014535&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15014535]
> [2] - https://github.com/apache/spark/pull/11785



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message